Hello. I need to identify continuous enrollment of at least 6 months or more for members. The data spans multiple years, and a consecutive 6 months can be from one year to the next. I would like a new variable to be created that has either a Y (6 consecutive months or >), or N (not conscutive 6 months or >) consecutive. In addition, a member can have multiple periods of 6 consecutive months in the plan. Below is how I would like the data to look after running the code to determine 6 consecutive months. The data I have now DOES NOT have the 6_Mos_Cons variable, that's what I want created.
Data example is this:
Acct_Nbr | Cont_Mnth | 6_Mos_Cons |
123456 | 2022/01 | Y |
123456 | 2022/02 | |
123456 | 2022/03 | |
123456 | 2022/04 | |
123456 | 2022/05 | |
123456 | 2022/06 | |
234567 | 2022/01 | N |
234567 | 2022/02 | |
234567 | 2022/04 | |
234567 | 2022/07 | Y |
234567 | 2022/08 | |
234567 | 2022/09 | |
234567 | 2022/10 | |
234567 | 2022/11 | |
234567 | 2022/12 | |
345678 | 2021/10 | Y |
345678 | 2021/11 | |
345678 | 2021/12 | |
345678 | 2022/03 | |
345678 | 2022/04 | |
345678 | 2022/05 |
The Y or N for the created variable can either be on the first line of a consecutive or non-consecutive span of months, or it can be on the last. Also, if it is easier, it can be numeric, ie 1 = consecutive and 0 = non-consecutive.
Hi,
I have a doubt about your requirements.
You said you wanted to find accounts with 6 consecutive months of enrollment.
In your example, it is clear why acct_nbr 123456 is a "Y", and 234567 (in 2022/07) is also a "Y".
But I fail to understand why acct_nbr 345678 is a "Y", because it has two periods with 3 consecutive months but no 6 consecutive months.
However, supposing I got your requirements right:
this should do it:
data target; /* source table, must have acct_nbr and cont_mnth (numeric, date), must be sorted by acct_nbr and conth_mnth */ set source; by acct_nbr cont_mnth; /* history - ensures that consecutive acct_nbrs don't affect each other */ if first.acct_nbr then history=0; history+1; /* usage of lag function to store and retrieve data of the 6th previous record */ cont_mnth_lag5 = lag5(cont_mnth); /* this will only be true if: last 6 records are of current acct_nbr distance between current month and month of the 6th previous record is equal to 5 months */ if history ge 6 and 5=intck('month',cont_mnth_lag5,cont_mnth) then '6_Mos_Cons'n = 'Y'; /* target will only have required fields */ keep acct_nbr cont_mnth '6_Mos_Cons'n; run;
Notice that this will only mark the last month of a 6 consecutive month streak for the same acct_nbr as "Y".
It will not mark any record with "N".
Hi,
I have a doubt about your requirements.
You said you wanted to find accounts with 6 consecutive months of enrollment.
In your example, it is clear why acct_nbr 123456 is a "Y", and 234567 (in 2022/07) is also a "Y".
But I fail to understand why acct_nbr 345678 is a "Y", because it has two periods with 3 consecutive months but no 6 consecutive months.
However, supposing I got your requirements right:
this should do it:
data target; /* source table, must have acct_nbr and cont_mnth (numeric, date), must be sorted by acct_nbr and conth_mnth */ set source; by acct_nbr cont_mnth; /* history - ensures that consecutive acct_nbrs don't affect each other */ if first.acct_nbr then history=0; history+1; /* usage of lag function to store and retrieve data of the 6th previous record */ cont_mnth_lag5 = lag5(cont_mnth); /* this will only be true if: last 6 records are of current acct_nbr distance between current month and month of the 6th previous record is equal to 5 months */ if history ge 6 and 5=intck('month',cont_mnth_lag5,cont_mnth) then '6_Mos_Cons'n = 'Y'; /* target will only have required fields */ keep acct_nbr cont_mnth '6_Mos_Cons'n; run;
Notice that this will only mark the last month of a 6 consecutive month streak for the same acct_nbr as "Y".
It will not mark any record with "N".
Thank you for your reply. And, yes my apologies. The correct example data is:
Acct_Nbr | Cont_Mnth | 6_Mos_Cons |
123456 | 2022/01 | Y |
123456 | 2022/02 | |
123456 | 2022/03 | |
123456 | 2022/04 | |
123456 | 2022/05 | |
123456 | 2022/06 | |
234567 | 2022/01 | N |
234567 | 2022/02 | |
234567 | 2022/04 | |
234567 | 2022/07 | Y |
234567 | 2022/08 | |
234567 | 2022/09 | |
234567 | 2022/10 | |
234567 | 2022/11 | |
234567 | 2022/12 | |
345678 | 2021/10 | Y |
345678 | 2021/11 | |
345678 | 2021/12 | |
345678 | 2022/01 | |
345678 | 2022/02 | |
345678 | 2022/03 |
So, you can see here why 345678 is a Y.
Also, the Cont_Mnth field is not in date format, it's in character format. I believe this will not allow intck to work correctly? But I did run what you suggested and it looks like it's close to working. I got this:
Acct_Nbr | Cont_Mnth | history | cont_mnth_lag5 | cont_mnth | Cons_Mnth_6 |
123456 | 2022/01 | 1 | . | ||
123456 | 2022/02 | 2 | . | ||
123456 | 2022/03 | 3 | . | ||
123456 | 2022/04 | 4 | . | ||
123456 | 2022/05 | 5 | . | ||
123456 | 2022/06 | 6 | 2022/01 | . | |
123456 | 2022/07 | 7 | 2022/02 | . | |
123456 | 2022/08 | 8 | 2022/03 | . | |
123456 | 2022/09 | 9 | 2022/04 | . | |
123456 | 2022/10 | 10 | 2022/05 | . | |
123456 | 2022/11 | 11 | 2022/06 | . | |
123456 | 2022/12 | 12 | 2022/07 | . | |
123456 | 2022/01 | 13 | 2022/08 | . | |
123456 | 2022/02 | 14 | 2022/09 | . | |
123456 | 2022/03 | 15 | 2022/10 | . | |
123456 | 2022/04 | 16 | 2022/11 | . | |
789010 | 2022/01 | 1 | 2022/12 | . | |
789010 | 2022/02 | 2 | 2023/01 | . |
So at the start of each ACCT_NBR, or after any gap of more than one month you want a new variable (six_mon_con) set to "Y" or "N". It will be a "Y" if it is followed by 5 or more consecutive months. Otherwise "N".
If that is correct, then your sample expected result is wrong for 234567,2022/04 (which should be an N, not blank), 345678,2021/10 (which should be N, not Y) and 345678,2022/03 (which should be N, not blank).
Here a simple program that does what I expect you want:
data have;
input Acct_Nbr Cont_Mnth :$7. @21 expected_6_mons_con $1.;
cont_date=input(cont_mnth||'/01',yymmdd12.);
format cont_date date9. ;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
run;
data want (drop=_:);
merge have (firstobs=1)
have (firstobs=6 keep=acct_nbr cont_date rename=(acct_nbr=_ac6 cont_date=_cd6));
if acct_nbr^=lag(acct_nbr) or intck('month',lag(cont_date),cont_date)>1 then six_mon_or_more='N';
if six_mon_or_more='N' and _ac6=acct_nbr and intck('month',cont_date,_cd6)=5 then six_mon_or_more='Y';
run;
@JH74 wrote:
Thank you so much for your reply. I neglected to indicate in my original post that the fields are all Character, no date field. So, intck won't work.
Notice the I converted the character string into a date field in data set HAVE. But it that is not an option, then:
data have;
input Acct_Nbr Cont_Mnth :$7. @21 expected_6_mons_con $1.;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
run;
data want (drop=_:);
merge have (firstobs=1)
have (firstobs=6 keep=acct_nbr cont_mnth rename=(acct_nbr=_ac6 cont_mnth=_cm6));
_cd1=input(compress(cont_mnth,'/'),yymmn6.);
_cd6=input(compress(_cm6,'/'),yymmn6.);
if acct_nbr^=lag(acct_nbr) or intck('month',lag(_cd1),_cd1)>1 then six_mon_or_more='N';
if six_mon_or_more='N' and _ac6=acct_nbr and intck('month',_cd1,_cd6)=5 then six_mon_or_more='Y';
run;
BTW, this assumes that the data are grouped by acct_nbr, and (within acct_nbr) are sorted by cont_mnth. Also it assumes no more than one observation per month.
data have;
infile datalines truncover;
input Acct_Nbr Cont_Mnth :$7. expected_6_mons_con $;
cont_date=input(cont_mnth||'/01',yymmdd12.);
format cont_date date9. ;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
;
run;
proc sql;
create table want as
select *,case when
(select count(distinct cont_date) from have
where Acct_Nbr=a.Acct_Nbr and cont_date between a.cont_date and intnx('month',a.cont_date,5))
=6 then 'Y'
else 'N' end as want
from have as a;
quit;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.