Hello. I need to identify continuous enrollment of at least 6 months or more for members. The data spans multiple years, and a consecutive 6 months can be from one year to the next. I would like a new variable to be created that has either a Y (6 consecutive months or >), or N (not conscutive 6 months or >) consecutive. In addition, a member can have multiple periods of 6 consecutive months in the plan. Below is how I would like the data to look after running the code to determine 6 consecutive months. The data I have now DOES NOT have the 6_Mos_Cons variable, that's what I want created.
Data example is this:
Acct_Nbr | Cont_Mnth | 6_Mos_Cons |
123456 | 2022/01 | Y |
123456 | 2022/02 | |
123456 | 2022/03 | |
123456 | 2022/04 | |
123456 | 2022/05 | |
123456 | 2022/06 | |
234567 | 2022/01 | N |
234567 | 2022/02 | |
234567 | 2022/04 | |
234567 | 2022/07 | Y |
234567 | 2022/08 | |
234567 | 2022/09 | |
234567 | 2022/10 | |
234567 | 2022/11 | |
234567 | 2022/12 | |
345678 | 2021/10 | Y |
345678 | 2021/11 | |
345678 | 2021/12 | |
345678 | 2022/03 | |
345678 | 2022/04 | |
345678 | 2022/05 |
The Y or N for the created variable can either be on the first line of a consecutive or non-consecutive span of months, or it can be on the last. Also, if it is easier, it can be numeric, ie 1 = consecutive and 0 = non-consecutive.
Hi,
I have a doubt about your requirements.
You said you wanted to find accounts with 6 consecutive months of enrollment.
In your example, it is clear why acct_nbr 123456 is a "Y", and 234567 (in 2022/07) is also a "Y".
But I fail to understand why acct_nbr 345678 is a "Y", because it has two periods with 3 consecutive months but no 6 consecutive months.
However, supposing I got your requirements right:
this should do it:
data target; /* source table, must have acct_nbr and cont_mnth (numeric, date), must be sorted by acct_nbr and conth_mnth */ set source; by acct_nbr cont_mnth; /* history - ensures that consecutive acct_nbrs don't affect each other */ if first.acct_nbr then history=0; history+1; /* usage of lag function to store and retrieve data of the 6th previous record */ cont_mnth_lag5 = lag5(cont_mnth); /* this will only be true if: last 6 records are of current acct_nbr distance between current month and month of the 6th previous record is equal to 5 months */ if history ge 6 and 5=intck('month',cont_mnth_lag5,cont_mnth) then '6_Mos_Cons'n = 'Y'; /* target will only have required fields */ keep acct_nbr cont_mnth '6_Mos_Cons'n; run;
Notice that this will only mark the last month of a 6 consecutive month streak for the same acct_nbr as "Y".
It will not mark any record with "N".
Hi,
I have a doubt about your requirements.
You said you wanted to find accounts with 6 consecutive months of enrollment.
In your example, it is clear why acct_nbr 123456 is a "Y", and 234567 (in 2022/07) is also a "Y".
But I fail to understand why acct_nbr 345678 is a "Y", because it has two periods with 3 consecutive months but no 6 consecutive months.
However, supposing I got your requirements right:
this should do it:
data target; /* source table, must have acct_nbr and cont_mnth (numeric, date), must be sorted by acct_nbr and conth_mnth */ set source; by acct_nbr cont_mnth; /* history - ensures that consecutive acct_nbrs don't affect each other */ if first.acct_nbr then history=0; history+1; /* usage of lag function to store and retrieve data of the 6th previous record */ cont_mnth_lag5 = lag5(cont_mnth); /* this will only be true if: last 6 records are of current acct_nbr distance between current month and month of the 6th previous record is equal to 5 months */ if history ge 6 and 5=intck('month',cont_mnth_lag5,cont_mnth) then '6_Mos_Cons'n = 'Y'; /* target will only have required fields */ keep acct_nbr cont_mnth '6_Mos_Cons'n; run;
Notice that this will only mark the last month of a 6 consecutive month streak for the same acct_nbr as "Y".
It will not mark any record with "N".
Thank you for your reply. And, yes my apologies. The correct example data is:
Acct_Nbr | Cont_Mnth | 6_Mos_Cons |
123456 | 2022/01 | Y |
123456 | 2022/02 | |
123456 | 2022/03 | |
123456 | 2022/04 | |
123456 | 2022/05 | |
123456 | 2022/06 | |
234567 | 2022/01 | N |
234567 | 2022/02 | |
234567 | 2022/04 | |
234567 | 2022/07 | Y |
234567 | 2022/08 | |
234567 | 2022/09 | |
234567 | 2022/10 | |
234567 | 2022/11 | |
234567 | 2022/12 | |
345678 | 2021/10 | Y |
345678 | 2021/11 | |
345678 | 2021/12 | |
345678 | 2022/01 | |
345678 | 2022/02 | |
345678 | 2022/03 |
So, you can see here why 345678 is a Y.
Also, the Cont_Mnth field is not in date format, it's in character format. I believe this will not allow intck to work correctly? But I did run what you suggested and it looks like it's close to working. I got this:
Acct_Nbr | Cont_Mnth | history | cont_mnth_lag5 | cont_mnth | Cons_Mnth_6 |
123456 | 2022/01 | 1 | . | ||
123456 | 2022/02 | 2 | . | ||
123456 | 2022/03 | 3 | . | ||
123456 | 2022/04 | 4 | . | ||
123456 | 2022/05 | 5 | . | ||
123456 | 2022/06 | 6 | 2022/01 | . | |
123456 | 2022/07 | 7 | 2022/02 | . | |
123456 | 2022/08 | 8 | 2022/03 | . | |
123456 | 2022/09 | 9 | 2022/04 | . | |
123456 | 2022/10 | 10 | 2022/05 | . | |
123456 | 2022/11 | 11 | 2022/06 | . | |
123456 | 2022/12 | 12 | 2022/07 | . | |
123456 | 2022/01 | 13 | 2022/08 | . | |
123456 | 2022/02 | 14 | 2022/09 | . | |
123456 | 2022/03 | 15 | 2022/10 | . | |
123456 | 2022/04 | 16 | 2022/11 | . | |
789010 | 2022/01 | 1 | 2022/12 | . | |
789010 | 2022/02 | 2 | 2023/01 | . |
So at the start of each ACCT_NBR, or after any gap of more than one month you want a new variable (six_mon_con) set to "Y" or "N". It will be a "Y" if it is followed by 5 or more consecutive months. Otherwise "N".
If that is correct, then your sample expected result is wrong for 234567,2022/04 (which should be an N, not blank), 345678,2021/10 (which should be N, not Y) and 345678,2022/03 (which should be N, not blank).
Here a simple program that does what I expect you want:
data have;
input Acct_Nbr Cont_Mnth :$7. @21 expected_6_mons_con $1.;
cont_date=input(cont_mnth||'/01',yymmdd12.);
format cont_date date9. ;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
run;
data want (drop=_:);
merge have (firstobs=1)
have (firstobs=6 keep=acct_nbr cont_date rename=(acct_nbr=_ac6 cont_date=_cd6));
if acct_nbr^=lag(acct_nbr) or intck('month',lag(cont_date),cont_date)>1 then six_mon_or_more='N';
if six_mon_or_more='N' and _ac6=acct_nbr and intck('month',cont_date,_cd6)=5 then six_mon_or_more='Y';
run;
@JH74 wrote:
Thank you so much for your reply. I neglected to indicate in my original post that the fields are all Character, no date field. So, intck won't work.
Notice the I converted the character string into a date field in data set HAVE. But it that is not an option, then:
data have;
input Acct_Nbr Cont_Mnth :$7. @21 expected_6_mons_con $1.;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
run;
data want (drop=_:);
merge have (firstobs=1)
have (firstobs=6 keep=acct_nbr cont_mnth rename=(acct_nbr=_ac6 cont_mnth=_cm6));
_cd1=input(compress(cont_mnth,'/'),yymmn6.);
_cd6=input(compress(_cm6,'/'),yymmn6.);
if acct_nbr^=lag(acct_nbr) or intck('month',lag(_cd1),_cd1)>1 then six_mon_or_more='N';
if six_mon_or_more='N' and _ac6=acct_nbr and intck('month',_cd1,_cd6)=5 then six_mon_or_more='Y';
run;
BTW, this assumes that the data are grouped by acct_nbr, and (within acct_nbr) are sorted by cont_mnth. Also it assumes no more than one observation per month.
data have;
infile datalines truncover;
input Acct_Nbr Cont_Mnth :$7. expected_6_mons_con $;
cont_date=input(cont_mnth||'/01',yymmdd12.);
format cont_date date9. ;
datalines;
123456 2022/01 Y
123456 2022/02
123456 2022/03
123456 2022/04
123456 2022/05
123456 2022/06
234567 2022/01 N
234567 2022/02
234567 2022/04
234567 2022/07 Y
234567 2022/08
234567 2022/09
234567 2022/10
234567 2022/11
234567 2022/12
345678 2021/10 Y
345678 2021/11
345678 2021/12
345678 2022/03
345678 2022/04
345678 2022/05
;
run;
proc sql;
create table want as
select *,case when
(select count(distinct cont_date) from have
where Acct_Nbr=a.Acct_Nbr and cont_date between a.cont_date and intnx('month',a.cont_date,5))
=6 then 'Y'
else 'N' end as want
from have as a;
quit;
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.