Hello. I need to identify continuous enrollment of at least 6 months or more for members. The data spans multiple years, and a consecutive 6 months can be from one year to the next. I would like a new variable to be created that has either a Y (6 consecutive months or >), or N (not conscutive 6 months or >) consecutive. In addition, a member can have multiple periods of 6 consecutive months in the plan. Below is how I would like the data to look after running the code to determine 6 consecutive months. The data I have now DOES NOT have the 6_Mos_Cons variable, that's what I want created.
Data example is this:
| Acct_Nbr | Cont_Mnth | 6_Mos_Cons |
| 123456 | 2022/01 | Y |
| 123456 | 2022/02 | |
| 123456 | 2022/03 | |
| 123456 | 2022/04 | |
| 123456 | 2022/05 | |
| 123456 | 2022/06 | |
| 234567 | 2022/01 | N |
| 234567 | 2022/02 | |
| 234567 | 2022/04 | |
| 234567 | 2022/07 | Y |
| 234567 | 2022/08 | |
| 234567 | 2022/09 | |
| 234567 | 2022/10 | |
| 234567 | 2022/11 | |
| 234567 | 2022/12 | |
| 345678 | 2021/10 | Y |
| 345678 | 2021/11 | |
| 345678 | 2021/12 | |
| 345678 | 2022/03 | |
| 345678 | 2022/04 | |
| 345678 | 2022/05 |
The Y or N for the created variable can either be on the first line of a consecutive or non-consecutive span of months, or it can be on the last. Also, if it is easier, it can be numeric, ie 1 = consecutive and 0 = non-consecutive.
Hi,
I have a doubt about your requirements.
You said you wanted to find accounts with 6 consecutive months of enrollment.
In your example, it is clear why acct_nbr 123456 is a "Y", and 234567 (in 2022/07) is also a "Y".
But I fail to understand why acct_nbr 345678 is a "Y", because it has two periods with 3 consecutive months but no 6 consecutive months.
However, supposing I got your requirements right:
this should do it:
data target;
/* source table, must have acct_nbr and cont_mnth (numeric, date), must be sorted by acct_nbr and conth_mnth */
set source;
by acct_nbr cont_mnth;
/* history - ensures that consecutive acct_nbrs don't affect each other */
if first.acct_nbr then history=0;
history+1;
/* usage of lag function to store and retrieve data of the 6th previous record */
cont_mnth_lag5 = lag5(cont_mnth);
/* this will only be true if:
last 6 records are of current acct_nbr
distance between current month and month of the 6th previous record is equal to 5 months
*/
if history ge 6 and 5=intck('month',cont_mnth_lag5,cont_mnth)
then '6_Mos_Cons'n = 'Y';
/* target will only have required fields */
keep acct_nbr cont_mnth '6_Mos_Cons'n;
run;Notice that this will only mark the last month of a 6 consecutive month streak for the same acct_nbr as "Y".
It will not mark any record with "N".
Hi,
I have a doubt about your requirements.
You said you wanted to find accounts with 6 consecutive months of enrollment.
In your example, it is clear why acct_nbr 123456 is a "Y", and 234567 (in 2022/07) is also a "Y".
But I fail to understand why acct_nbr 345678 is a "Y", because it has two periods with 3 consecutive months but no 6 consecutive months.
However, supposing I got your requirements right:
this should do it:
data target;
/* source table, must have acct_nbr and cont_mnth (numeric, date), must be sorted by acct_nbr and conth_mnth */
set source;
by acct_nbr cont_mnth;
/* history - ensures that consecutive acct_nbrs don't affect each other */
if first.acct_nbr then history=0;
history+1;
/* usage of lag function to store and retrieve data of the 6th previous record */
cont_mnth_lag5 = lag5(cont_mnth);
/* this will only be true if:
last 6 records are of current acct_nbr
distance between current month and month of the 6th previous record is equal to 5 months
*/
if history ge 6 and 5=intck('month',cont_mnth_lag5,cont_mnth)
then '6_Mos_Cons'n = 'Y';
/* target will only have required fields */
keep acct_nbr cont_mnth '6_Mos_Cons'n;
run;Notice that this will only mark the last month of a 6 consecutive month streak for the same acct_nbr as "Y".
It will not mark any record with "N".
Thank you for your reply. And, yes my apologies. The correct example data is:
| Acct_Nbr | Cont_Mnth | 6_Mos_Cons |
| 123456 | 2022/01 | Y |
| 123456 | 2022/02 | |
| 123456 | 2022/03 | |
| 123456 | 2022/04 | |
| 123456 | 2022/05 | |
| 123456 | 2022/06 | |
| 234567 | 2022/01 | N |
| 234567 | 2022/02 | |
| 234567 | 2022/04 | |
| 234567 | 2022/07 | Y |
| 234567 | 2022/08 | |
| 234567 | 2022/09 | |
| 234567 | 2022/10 | |
| 234567 | 2022/11 | |
| 234567 | 2022/12 | |
| 345678 | 2021/10 | Y |
| 345678 | 2021/11 | |
| 345678 | 2021/12 | |
| 345678 | 2022/01 | |
| 345678 | 2022/02 | |
| 345678 | 2022/03 |
So, you can see here why 345678 is a Y.
Also, the Cont_Mnth field is not in date format, it's in character format. I believe this will not allow intck to work correctly? But I did run what you suggested and it looks like it's close to working. I got this:
| Acct_Nbr | Cont_Mnth | history | cont_mnth_lag5 | cont_mnth | Cons_Mnth_6 |
| 123456 | 2022/01 | 1 | . | ||
| 123456 | 2022/02 | 2 | . | ||
| 123456 | 2022/03 | 3 | . | ||
| 123456 | 2022/04 | 4 | . | ||
| 123456 | 2022/05 | 5 | . | ||
| 123456 | 2022/06 | 6 | 2022/01 | . | |
| 123456 | 2022/07 | 7 | 2022/02 | . | |
| 123456 | 2022/08 | 8 | 2022/03 | . | |
| 123456 | 2022/09 | 9 | 2022/04 | . | |
| 123456 | 2022/10 | 10 | 2022/05 | . | |
| 123456 | 2022/11 | 11 | 2022/06 | . | |
| 123456 | 2022/12 | 12 | 2022/07 | . | |
| 123456 | 2022/01 | 13 | 2022/08 | . | |
| 123456 | 2022/02 | 14 | 2022/09 | . | |
| 123456 | 2022/03 | 15 | 2022/10 | . | |
| 123456 | 2022/04 | 16 | 2022/11 | . | |
| 789010 | 2022/01 | 1 | 2022/12 | . | |
| 789010 | 2022/02 | 2 | 2023/01 | . |
So at the start of each ACCT_NBR, or after any gap of more than one month you want a new variable (six_mon_con) set to "Y" or "N". It will be a "Y" if it is followed by 5 or more consecutive months. Otherwise "N".
If that is correct, then your sample expected result is wrong for 234567,2022/04 (which should be an N, not blank), 345678,2021/10 (which should be N, not Y) and 345678,2022/03 (which should be N, not blank).
Here a simple program that does what I expect you want:
data have;
input Acct_Nbr Cont_Mnth :$7. @21 expected_6_mons_con $1.;
cont_date=input(cont_mnth||'/01',yymmdd12.);
format cont_date date9. ;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
run;
data want (drop=_:);
merge have (firstobs=1)
have (firstobs=6 keep=acct_nbr cont_date rename=(acct_nbr=_ac6 cont_date=_cd6));
if acct_nbr^=lag(acct_nbr) or intck('month',lag(cont_date),cont_date)>1 then six_mon_or_more='N';
if six_mon_or_more='N' and _ac6=acct_nbr and intck('month',cont_date,_cd6)=5 then six_mon_or_more='Y';
run;
@JH74 wrote:
Thank you so much for your reply. I neglected to indicate in my original post that the fields are all Character, no date field. So, intck won't work.
Notice the I converted the character string into a date field in data set HAVE. But it that is not an option, then:
data have;
input Acct_Nbr Cont_Mnth :$7. @21 expected_6_mons_con $1.;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
run;
data want (drop=_:);
merge have (firstobs=1)
have (firstobs=6 keep=acct_nbr cont_mnth rename=(acct_nbr=_ac6 cont_mnth=_cm6));
_cd1=input(compress(cont_mnth,'/'),yymmn6.);
_cd6=input(compress(_cm6,'/'),yymmn6.);
if acct_nbr^=lag(acct_nbr) or intck('month',lag(_cd1),_cd1)>1 then six_mon_or_more='N';
if six_mon_or_more='N' and _ac6=acct_nbr and intck('month',_cd1,_cd6)=5 then six_mon_or_more='Y';
run;
BTW, this assumes that the data are grouped by acct_nbr, and (within acct_nbr) are sorted by cont_mnth. Also it assumes no more than one observation per month.
data have;
infile datalines truncover;
input Acct_Nbr Cont_Mnth :$7. expected_6_mons_con $;
cont_date=input(cont_mnth||'/01',yymmdd12.);
format cont_date date9. ;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
;
run;
proc sql;
create table want as
select *,case when
(select count(distinct cont_date) from have
where Acct_Nbr=a.Acct_Nbr and cont_date between a.cont_date and intnx('month',a.cont_date,5))
=6 then 'Y'
else 'N' end as want
from have as a;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.