BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bkq32
Quartz | Level 8

I have a dataset with indicators for if a given person was enrolled in a program at various months. I want to create the following variables:

 

COMPLETE1 = Month that marks first 3 months of continuous enrollment

COMPLETE2 = If there are multiple periods of continuous enrollment, the month that marks the first 3 months of continuous enrollment for the next period

FLAGS1 = Total months of enrollment for the time period corresponding to COMPLETE1

FLAGS2 = Total months of enrollment for the time period corresponding to COMPLETE2

 

I was able to obtain COMPLETE1 but not the others. Any help would be appreciated.

data have;
 input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
 cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
;
run;

data want;
 input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10 complete1 complete2 flags1 flags2;
 cards;
A 1 1 1 1 1 1 1 1 1 1 3 . 10 .
B 1 0 1 1 1 1 0 1 1 1 5 10 4 3
C 0 0 1 1 1 1 0 0 0 0 5 . 4 .
D 0 0 0 0 0 1 0 0 0 0 . . . .
E 0 1 1 1 0 1 1 1 0 0 4 8 3 3
;
run;

data want;
 set want;
 label
 complete1 = "COMPLETE1 (Month that marks first 3 months of continuous enrollment"
 complete2 = "COMPLETE2 (If there are multiple periods of continuous enrollment, month that marks the next 3 months of continuous enrollment"
 flags1 = "FLAGS1 (Months of coverage for the first enrollment period identified)"
 flags2 = "FLAGS2 (Months of coverage for the second enrollment period identified)"
 ;
run;


data attempt;
 set have;
 length m $500;
 m=cats(of month_01-month_10);
 p='111';
 f=find(m,p);
 if f then complete1=f+length(p)-1;
run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This program makes a character string of the month values.  It then searches for the leftmost string of '111'.  If one is found, it then

  1. establishes the corresponding complete value,
  2. replaces the '111' with '000'.
  3. counts subsequent 1's (and resets them to zero) to generate the flag value.

Note that by replacing long string of 1's with 0's for the first enrollment, finding the second enrollment just means searching for the leftmost '111'.

 

You can establish more than two enrollments by increasing the complete and flag array sizes, and increasing the upper limit of the DO _i ... loop.

 

 

data have;
 input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
 cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
F 0 0 0 0 0 0 0 1 1 1
;
run;

data want (drop=_:);
  set have;
  array complete {2};
  array flag{2};

  _strng=cats(of month_:) ;

  do _i=1 to 2 while (index(_strng,'111'));
    complete{_i}=index(_strng,'111')+2;
    substr(_strng,complete{_i}-2,3)='000';
    do _c=complete{_i}+1 by 1 while(char(_strng,_c)='1');
      substr(_strng,_c,1)='0';
    end;
    flag{_i}= _c - (complete{_i}-2);
  end;

run;

Since the length of the character variable _strng is unspecified, it defaults to 200, meaning it can accommodate up to 199 months.  If you specify a specific length for _strng, make sure it is a byte longer than the number of months.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
ballardw
Super User

What are you going to do when you have more than 10 months of data? A solution that works for this data set might have to be significantly rewritten for 15.

 

You don't clearly define which "month" value you want. It appears to be the LAST of a three month period is that correct?

bkq32
Quartz | Level 8

My actual data set has ~200 months of data, and if there happens to be more than 2 periods of at least 3 continuous months of coverage (e.g 111011110111), I'd be fine with just identifying the first two periods. And correct, so for the example below, I want COMPLETE1 to indicate the position of the first [1], and COMPLETE2 to indicate the position of the second [1].
11[1]011[1]10111

AhmedAl_Attar
Ammonite | Level 13

Hi @bkq32 

Try this

data have;
 input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
 cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
;
run;

data want(KEEP=mon: fl: comp:) ;
	set have;
	array m {*} 3 mon:;
	length str $10;
	call missing (complete1,complete2,flag1,flag2);
	str = cats(of m{*});
	put str=;
	start=1;
	stop=10;
	cptrn= prxparse('/1{3}/');
	fptrn= prxparse('/1{3,}0/');

	/* Find first 3 months of continuous enrollment */
	pos=prxmatch(cptrn,str);
	if (pos > 0) then
	do;
		complete1=pos+2; /* Record the position at the of pattern */
		start=complete1; /* Change our starting position */
	end;

	/* Find Total months of enrollment for the time period corresponding to COMPLETE1 */
	pos2=prxmatch(fptrn,str);
	if ( pos2 > 0) then 
		flag1=findc(str,'0',pos2) - pos2; /* Count number of months */

	/* Find second/additional 3 months of continuous enrollment */
	if (flag1 > 0) then
	do;
		call prxnext(cptrn,start,stop,str,position,length);
		do while(position>0);
			complete2=position+2;
			
			pos2=findc(str,'0',position);
			if ( pos2 > 0) then 
				flag2 = (pos2 - position); /* Count number of months */
			else
				flag2 = (10 - position + 1);
			
			call prxnext(cptrn,start,stop,str,position,length);
		end;
	end;

	if ((complete1 > 0) and (flag1 <=0)) then
		flag1 = 10;
run;
proc print data=want noobs; run;

Hope this helps

bkq32
Quartz | Level 8

Thank you, @AhmedAl_Attar . It's very close to what I want, but it breaks down for the first row of this new example. FLAG1=10 when it should equal 4 to reflect the 4 months of continuous enrollment. Do you happen to know how to fix this?

data have;
 input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
 cards;
A 0 0 0 0 0 0 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
;
run;

data want(KEEP=mon: fl: comp:) ;
	set have;
	array m {*} 3 mon:;
	length str $10;
	call missing (complete1,complete2,flag1,flag2);
	str = cats(of m{*});
	put str=;
	start=1;
	stop=10;
	cptrn= prxparse('/1{3}/');
	fptrn= prxparse('/1{3,}0/');

	/* Find first 3 months of continuous enrollment */
	pos=prxmatch(cptrn,str);
	if (pos > 0) then
	do;
		complete1=pos+2; /* Record the position at the of pattern */
		start=complete1; /* Change our starting position */
	end;

	/* Find Total months of enrollment for the time period corresponding to COMPLETE1 */
	pos2=prxmatch(fptrn,str);
	if ( pos2 > 0) then 
		flag1=findc(str,'0',pos2) - pos2; /* Count number of months */

	/* Find second/additional 3 months of continuous enrollment */
	if (flag1 > 0) then
	do;
		call prxnext(cptrn,start,stop,str,position,length);
		do while(position>0);
			complete2=position+2;
			
			pos2=findc(str,'0',position);
			if ( pos2 > 0) then 
				flag2 = (pos2 - position); /* Count number of months */
			else
				flag2 = (10 - position + 1);
			
			call prxnext(cptrn,start,stop,str,position,length);
		end;
	end;

	if ((complete1 > 0) and (flag1 <=0)) then
		flag1 = 10;
run;
proc print data=want (obs=1); run;
AhmedAl_Attar
Ammonite | Level 13

Hi @bkq32 

to fix this , just change the last if statement as follows

	if ((complete1 > 0) and (flag1 <=0)) then
		flag1 = (10 - prxmatch(cptrn,str) + 1);

But I would suggest you adopt @mkeintz solution. it's a much simpler straightforward solution 😉

mkeintz
PROC Star

This program makes a character string of the month values.  It then searches for the leftmost string of '111'.  If one is found, it then

  1. establishes the corresponding complete value,
  2. replaces the '111' with '000'.
  3. counts subsequent 1's (and resets them to zero) to generate the flag value.

Note that by replacing long string of 1's with 0's for the first enrollment, finding the second enrollment just means searching for the leftmost '111'.

 

You can establish more than two enrollments by increasing the complete and flag array sizes, and increasing the upper limit of the DO _i ... loop.

 

 

data have;
 input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
 cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
F 0 0 0 0 0 0 0 1 1 1
;
run;

data want (drop=_:);
  set have;
  array complete {2};
  array flag{2};

  _strng=cats(of month_:) ;

  do _i=1 to 2 while (index(_strng,'111'));
    complete{_i}=index(_strng,'111')+2;
    substr(_strng,complete{_i}-2,3)='000';
    do _c=complete{_i}+1 by 1 while(char(_strng,_c)='1');
      substr(_strng,_c,1)='0';
    end;
    flag{_i}= _c - (complete{_i}-2);
  end;

run;

Since the length of the character variable _strng is unspecified, it defaults to 200, meaning it can accommodate up to 199 months.  If you specify a specific length for _strng, make sure it is a byte longer than the number of months.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
bkq32
Quartz | Level 8
@mkeintz Awesome, thank you for the explanation!
Ksharp
Super User
data have;
 input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
 cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
;
run;
data want;
 set have;
 temp=cats(of month_:);

 pid1=prxparse('/(1{3,})/');
 p1=prxmatch(pid1,temp);
 if p1 then do;
   a = prxposn(pid1, 1, temp);
  complete1=p1+2;
  complete2=.;
  flags1 = length(a);
  flags2 = .;
 end;

 pid2=prxparse('/(1{3,})(0+)(1{3,})/');
 p2=prxmatch(pid2,temp);
 if p2 then do;
   a = prxposn(pid2, 1, temp);
   b = prxposn(pid2, 2, temp);
   c = prxposn(pid2, 3, temp);
  complete2=p2+length(a)+length(b)+2;
  flags2 = length(c);
 end;

 drop pid1 pid2 p1 p2 a b c temp;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1120 views
  • 5 likes
  • 5 in conversation