BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
JH74
Obsidian | Level 7

Hello.  I need to identify continuous enrollment of at least 6 months or more for members.  The data spans multiple years, and a consecutive 6 months can be from one year to the next.  I would like a new variable to be created that has either a Y (6 consecutive months or >), or N (not conscutive 6 months or >) consecutive.  In addition, a member can have multiple periods of 6 consecutive months in the plan.  Below is how I would like the data to look after running the code to determine 6 consecutive months.  The data I have now DOES NOT have the 6_Mos_Cons variable, that's what I want created.  

 

Data example is this:

Acct_NbrCont_Mnth6_Mos_Cons
1234562022/01Y
1234562022/02 
1234562022/03 
1234562022/04 
1234562022/05 
1234562022/06 
2345672022/01N
2345672022/02 
2345672022/04 
2345672022/07Y
2345672022/08 
2345672022/09 
2345672022/10 
2345672022/11 
2345672022/12 
3456782021/10Y
3456782021/11 
3456782021/12 
3456782022/03 
3456782022/04 
3456782022/05 

 

The Y or N for the created variable can either be on the first line of a consecutive or non-consecutive span of months, or it can be on the last.  Also, if it is easier, it can be numeric, ie 1 = consecutive and 0 = non-consecutive.  

1 ACCEPTED SOLUTION

Accepted Solutions
José_Costa_biw
Fluorite | Level 6

Hi,

I have a doubt about your requirements.

You said you wanted to find accounts with 6 consecutive months of enrollment.

In your example, it is clear why acct_nbr 123456 is a "Y", and 234567 (in 2022/07) is also a "Y".

But I fail to understand why acct_nbr 345678 is a "Y", because it has two periods with 3 consecutive months but no 6 consecutive months.

 

However, supposing I got your requirements right:

  • source table with acct_nbr and date (cont_mnth);
  • cont_mnth is a numeric (date) value;
  • there are no duplicates, meaning there will only be one record for each acct_nbr and cont_mnth (year and month);
  • source table is sorted by acct_nbr and cont_mnth,

this should do it:

 

 

data target;
	/* source table, must have acct_nbr and cont_mnth (numeric, date), must be sorted by acct_nbr and conth_mnth */
	set source; 
	by acct_nbr cont_mnth;
	/* history - ensures that consecutive acct_nbrs don't affect each other */
	if first.acct_nbr then history=0;
	history+1;
	/* usage of lag function to store and retrieve data of the 6th previous record */
	cont_mnth_lag5 = lag5(cont_mnth);
	/* this will only be true if:
			last 6 records are of current acct_nbr 
			distance between current month and month of the 6th previous record is equal to 5 months
	*/
	if history ge 6 and 5=intck('month',cont_mnth_lag5,cont_mnth)
		then '6_Mos_Cons'n = 'Y'; 

	/* target will only have required fields */
	keep acct_nbr cont_mnth '6_Mos_Cons'n; 
run;

Notice that this will only mark the last month of a 6 consecutive month streak for the same acct_nbr as "Y".

It will not mark any record with "N".

 

View solution in original post

8 REPLIES 8
José_Costa_biw
Fluorite | Level 6

Hi,

I have a doubt about your requirements.

You said you wanted to find accounts with 6 consecutive months of enrollment.

In your example, it is clear why acct_nbr 123456 is a "Y", and 234567 (in 2022/07) is also a "Y".

But I fail to understand why acct_nbr 345678 is a "Y", because it has two periods with 3 consecutive months but no 6 consecutive months.

 

However, supposing I got your requirements right:

  • source table with acct_nbr and date (cont_mnth);
  • cont_mnth is a numeric (date) value;
  • there are no duplicates, meaning there will only be one record for each acct_nbr and cont_mnth (year and month);
  • source table is sorted by acct_nbr and cont_mnth,

this should do it:

 

 

data target;
	/* source table, must have acct_nbr and cont_mnth (numeric, date), must be sorted by acct_nbr and conth_mnth */
	set source; 
	by acct_nbr cont_mnth;
	/* history - ensures that consecutive acct_nbrs don't affect each other */
	if first.acct_nbr then history=0;
	history+1;
	/* usage of lag function to store and retrieve data of the 6th previous record */
	cont_mnth_lag5 = lag5(cont_mnth);
	/* this will only be true if:
			last 6 records are of current acct_nbr 
			distance between current month and month of the 6th previous record is equal to 5 months
	*/
	if history ge 6 and 5=intck('month',cont_mnth_lag5,cont_mnth)
		then '6_Mos_Cons'n = 'Y'; 

	/* target will only have required fields */
	keep acct_nbr cont_mnth '6_Mos_Cons'n; 
run;

Notice that this will only mark the last month of a 6 consecutive month streak for the same acct_nbr as "Y".

It will not mark any record with "N".

 

JH74
Obsidian | Level 7

Thank you for your reply.  And, yes my apologies.  The correct example data is:

Acct_NbrCont_Mnth6_Mos_Cons
1234562022/01Y
1234562022/02 
1234562022/03 
1234562022/04 
1234562022/05 
1234562022/06 
2345672022/01N
2345672022/02 
2345672022/04 
2345672022/07Y
2345672022/08 
2345672022/09 
2345672022/10 
2345672022/11 
2345672022/12 
3456782021/10Y
3456782021/11 
3456782021/12 
3456782022/01 
3456782022/02 
3456782022/03 

 

So, you can see here why 345678 is a Y.  

 

Also, the Cont_Mnth field is not in date format, it's in character format.  I believe this will not allow intck to work correctly?  But I did run what you suggested and it looks like it's close to working.  I got this:

Acct_NbrCont_Mnthhistorycont_mnth_lag5cont_mnthCons_Mnth_6
1234562022/011 . 
1234562022/022 . 
1234562022/033 . 
1234562022/044 . 
1234562022/055 . 
1234562022/0662022/01. 
1234562022/0772022/02. 
1234562022/0882022/03. 
1234562022/0992022/04. 
1234562022/10102022/05. 
1234562022/11112022/06. 
1234562022/12122022/07. 
1234562022/01132022/08. 
1234562022/02142022/09. 
1234562022/03152022/10. 
1234562022/04162022/11. 
7890102022/0112022/12. 
7890102022/0222023/01. 

 

José_Costa_biw
Fluorite | Level 6
Hi,

In order to convert string to date, I suggest the following:

newvar = input(compress(oldvar,'/'),yymmn6.)

Just replace oldvar for the name of the variable that stores date as char yyyy/mm and newvar for the new variable to be created as numeric date. You may with to set a proper format, of course...
mkeintz
PROC Star

So at the start of each ACCT_NBR, or after any gap of more than one month you want a new variable (six_mon_con) set to "Y" or "N".  It will be a "Y"  if it is followed by 5 or more consecutive months.  Otherwise "N".

 

If that is correct, then your sample expected result is wrong for 234567,2022/04 (which should be an N, not blank), 345678,2021/10 (which should be N, not Y) and 345678,2022/03 (which should be N, not blank).

 

Here a simple program that does what I expect you want:

 

data have;
  input Acct_Nbr	Cont_Mnth	:$7. @21 expected_6_mons_con $1.;
  cont_date=input(cont_mnth||'/01',yymmdd12.);
  format cont_date date9. ;
datalines;
123456	2022/01	Y
123456	2022/02	 
123456	2022/03	 
123456	2022/04	 
123456	2022/05	 
123456	2022/06	 
234567	2022/01	N
234567	2022/02	 
234567	2022/04	 
234567	2022/07	Y
234567	2022/08	 
234567	2022/09	 
234567	2022/10	 
234567	2022/11	 
234567	2022/12	 
345678	2021/10	Y
345678	2021/11	 
345678	2021/12	 
345678	2022/03	 
345678	2022/04	 
345678	2022/05	 
run;

data want (drop=_:);
  merge have (firstobs=1)
        have (firstobs=6 keep=acct_nbr cont_date rename=(acct_nbr=_ac6 cont_date=_cd6));

  if acct_nbr^=lag(acct_nbr) or intck('month',lag(cont_date),cont_date)>1      then six_mon_or_more='N';
  if six_mon_or_more='N' and _ac6=acct_nbr and intck('month',cont_date,_cd6)=5 then six_mon_or_more='Y';
run;

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JH74
Obsidian | Level 7
Thank you so much for your reply. I neglected to indicate in my original post that the fields are all Character, no date field. So, intck won't work.
mkeintz
PROC Star

@JH74 wrote:
Thank you so much for your reply. I neglected to indicate in my original post that the fields are all Character, no date field. So, intck won't work.

Notice the I converted the character string into a date field in data set HAVE.  But it that is not an option, then:

 

data have;
  input Acct_Nbr	Cont_Mnth	:$7. @21 expected_6_mons_con $1.;
datalines;
123456	2022/01	Y
123456	2022/02	 
123456	2022/03	 
123456	2022/04	 
123456	2022/05	 
123456	2022/06	 
234567	2022/01	N
234567	2022/02	 
234567	2022/04	 
234567	2022/07	Y
234567	2022/08	 
234567	2022/09	 
234567	2022/10	 
234567	2022/11	 
234567	2022/12	 
345678	2021/10	Y
345678	2021/11	 
345678	2021/12	 
345678	2022/03	 
345678	2022/04	 
345678	2022/05	 
run;

data want (drop=_:);
  merge have (firstobs=1)
        have (firstobs=6 keep=acct_nbr cont_mnth rename=(acct_nbr=_ac6 cont_mnth=_cm6));

  _cd1=input(compress(cont_mnth,'/'),yymmn6.);
  _cd6=input(compress(_cm6,'/'),yymmn6.);
  if acct_nbr^=lag(acct_nbr) or intck('month',lag(_cd1),_cd1)>1            then six_mon_or_more='N';
  if six_mon_or_more='N' and _ac6=acct_nbr and intck('month',_cd1,_cd6)=5 then six_mon_or_more='Y';
run;

BTW, this assumes that the data are grouped by acct_nbr, and (within acct_nbr) are sorted by cont_mnth.  Also it assumes no more than one observation per month.  

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
JH74
Obsidian | Level 7
I neglected to indicate that the Acct_Nbr and Cont_Mnth fields are CHARACTER, not DATE. Also, the last 3 records I had the wrong Cont_Mnth values. They should be 2022/01, 2022/02, and 2022/03.
Ksharp
Super User
data have;
infile datalines truncover;
  input Acct_Nbr	Cont_Mnth	:$7.  expected_6_mons_con $;
  cont_date=input(cont_mnth||'/01',yymmdd12.);
  format cont_date date9. ;
datalines;
123456	2022/01	Y
123456	2022/02	 
123456	2022/03	 
123456	2022/04	 
123456	2022/05	 
123456	2022/06	 
234567	2022/01	N
234567	2022/02	 
234567	2022/04	 
234567	2022/07	Y
234567	2022/08	 
234567	2022/09	 
234567	2022/10	 
234567	2022/11	 
234567	2022/12	 
345678	2021/10	Y
345678	2021/11	 
345678	2021/12	 
345678	2022/03	 
345678	2022/04	 
345678	2022/05	 
;
run;
proc sql;
create table want as
select *,case when 
(select count(distinct cont_date) from have 
where Acct_Nbr=a.Acct_Nbr and cont_date between a.cont_date and intnx('month',a.cont_date,5)) 
=6 then 'Y'
else 'N' end as want
 from have as a;
quit;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1733 views
  • 5 likes
  • 4 in conversation