I have a dataset with indicators for if a given person was enrolled in a program at various months. I want to create the following variables:
COMPLETE1 = Month that marks first 3 months of continuous enrollment
COMPLETE2 = If there are multiple periods of continuous enrollment, the month that marks the first 3 months of continuous enrollment for the next period
FLAGS1 = Total months of enrollment for the time period corresponding to COMPLETE1
FLAGS2 = Total months of enrollment for the time period corresponding to COMPLETE2
I was able to obtain COMPLETE1 but not the others. Any help would be appreciated.
data have;
input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
;
run;
data want;
input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10 complete1 complete2 flags1 flags2;
cards;
A 1 1 1 1 1 1 1 1 1 1 3 . 10 .
B 1 0 1 1 1 1 0 1 1 1 5 10 4 3
C 0 0 1 1 1 1 0 0 0 0 5 . 4 .
D 0 0 0 0 0 1 0 0 0 0 . . . .
E 0 1 1 1 0 1 1 1 0 0 4 8 3 3
;
run;
data want;
set want;
label
complete1 = "COMPLETE1 (Month that marks first 3 months of continuous enrollment"
complete2 = "COMPLETE2 (If there are multiple periods of continuous enrollment, month that marks the next 3 months of continuous enrollment"
flags1 = "FLAGS1 (Months of coverage for the first enrollment period identified)"
flags2 = "FLAGS2 (Months of coverage for the second enrollment period identified)"
;
run;
data attempt;
set have;
length m $500;
m=cats(of month_01-month_10);
p='111';
f=find(m,p);
if f then complete1=f+length(p)-1;
run;
This program makes a character string of the month values. It then searches for the leftmost string of '111'. If one is found, it then
Note that by replacing long string of 1's with 0's for the first enrollment, finding the second enrollment just means searching for the leftmost '111'.
You can establish more than two enrollments by increasing the complete and flag array sizes, and increasing the upper limit of the DO _i ... loop.
data have;
input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
F 0 0 0 0 0 0 0 1 1 1
;
run;
data want (drop=_:);
set have;
array complete {2};
array flag{2};
_strng=cats(of month_:) ;
do _i=1 to 2 while (index(_strng,'111'));
complete{_i}=index(_strng,'111')+2;
substr(_strng,complete{_i}-2,3)='000';
do _c=complete{_i}+1 by 1 while(char(_strng,_c)='1');
substr(_strng,_c,1)='0';
end;
flag{_i}= _c - (complete{_i}-2);
end;
run;
Since the length of the character variable _strng is unspecified, it defaults to 200, meaning it can accommodate up to 199 months. If you specify a specific length for _strng, make sure it is a byte longer than the number of months.
What are you going to do when you have more than 10 months of data? A solution that works for this data set might have to be significantly rewritten for 15.
You don't clearly define which "month" value you want. It appears to be the LAST of a three month period is that correct?
My actual data set has ~200 months of data, and if there happens to be more than 2 periods of at least 3 continuous months of coverage (e.g 111011110111), I'd be fine with just identifying the first two periods. And correct, so for the example below, I want COMPLETE1 to indicate the position of the first [1], and COMPLETE2 to indicate the position of the second [1].
11[1]011[1]10111
Hi @bkq32
Try this
data have;
input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
;
run;
data want(KEEP=mon: fl: comp:) ;
set have;
array m {*} 3 mon:;
length str $10;
call missing (complete1,complete2,flag1,flag2);
str = cats(of m{*});
put str=;
start=1;
stop=10;
cptrn= prxparse('/1{3}/');
fptrn= prxparse('/1{3,}0/');
/* Find first 3 months of continuous enrollment */
pos=prxmatch(cptrn,str);
if (pos > 0) then
do;
complete1=pos+2; /* Record the position at the of pattern */
start=complete1; /* Change our starting position */
end;
/* Find Total months of enrollment for the time period corresponding to COMPLETE1 */
pos2=prxmatch(fptrn,str);
if ( pos2 > 0) then
flag1=findc(str,'0',pos2) - pos2; /* Count number of months */
/* Find second/additional 3 months of continuous enrollment */
if (flag1 > 0) then
do;
call prxnext(cptrn,start,stop,str,position,length);
do while(position>0);
complete2=position+2;
pos2=findc(str,'0',position);
if ( pos2 > 0) then
flag2 = (pos2 - position); /* Count number of months */
else
flag2 = (10 - position + 1);
call prxnext(cptrn,start,stop,str,position,length);
end;
end;
if ((complete1 > 0) and (flag1 <=0)) then
flag1 = 10;
run;
proc print data=want noobs; run;
Hope this helps
Thank you, @AhmedAl_Attar . It's very close to what I want, but it breaks down for the first row of this new example. FLAG1=10 when it should equal 4 to reflect the 4 months of continuous enrollment. Do you happen to know how to fix this?
data have;
input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
cards;
A 0 0 0 0 0 0 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
;
run;
data want(KEEP=mon: fl: comp:) ;
set have;
array m {*} 3 mon:;
length str $10;
call missing (complete1,complete2,flag1,flag2);
str = cats(of m{*});
put str=;
start=1;
stop=10;
cptrn= prxparse('/1{3}/');
fptrn= prxparse('/1{3,}0/');
/* Find first 3 months of continuous enrollment */
pos=prxmatch(cptrn,str);
if (pos > 0) then
do;
complete1=pos+2; /* Record the position at the of pattern */
start=complete1; /* Change our starting position */
end;
/* Find Total months of enrollment for the time period corresponding to COMPLETE1 */
pos2=prxmatch(fptrn,str);
if ( pos2 > 0) then
flag1=findc(str,'0',pos2) - pos2; /* Count number of months */
/* Find second/additional 3 months of continuous enrollment */
if (flag1 > 0) then
do;
call prxnext(cptrn,start,stop,str,position,length);
do while(position>0);
complete2=position+2;
pos2=findc(str,'0',position);
if ( pos2 > 0) then
flag2 = (pos2 - position); /* Count number of months */
else
flag2 = (10 - position + 1);
call prxnext(cptrn,start,stop,str,position,length);
end;
end;
if ((complete1 > 0) and (flag1 <=0)) then
flag1 = 10;
run;
proc print data=want (obs=1); run;
This program makes a character string of the month values. It then searches for the leftmost string of '111'. If one is found, it then
Note that by replacing long string of 1's with 0's for the first enrollment, finding the second enrollment just means searching for the leftmost '111'.
You can establish more than two enrollments by increasing the complete and flag array sizes, and increasing the upper limit of the DO _i ... loop.
data have;
input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
F 0 0 0 0 0 0 0 1 1 1
;
run;
data want (drop=_:);
set have;
array complete {2};
array flag{2};
_strng=cats(of month_:) ;
do _i=1 to 2 while (index(_strng,'111'));
complete{_i}=index(_strng,'111')+2;
substr(_strng,complete{_i}-2,3)='000';
do _c=complete{_i}+1 by 1 while(char(_strng,_c)='1');
substr(_strng,_c,1)='0';
end;
flag{_i}= _c - (complete{_i}-2);
end;
run;
Since the length of the character variable _strng is unspecified, it defaults to 200, meaning it can accommodate up to 199 months. If you specify a specific length for _strng, make sure it is a byte longer than the number of months.
data have;
input person $ month_01 month_02 month_03 month_04 month_05 month_06 month_07 month_08 month_09 month_10;
cards;
A 1 1 1 1 1 1 1 1 1 1
B 1 0 1 1 1 1 0 1 1 1
C 0 0 1 1 1 1 0 0 0 0
D 0 0 0 0 0 1 0 0 0 0
E 0 1 1 1 0 1 1 1 0 0
;
run;
data want;
set have;
temp=cats(of month_:);
pid1=prxparse('/(1{3,})/');
p1=prxmatch(pid1,temp);
if p1 then do;
a = prxposn(pid1, 1, temp);
complete1=p1+2;
complete2=.;
flags1 = length(a);
flags2 = .;
end;
pid2=prxparse('/(1{3,})(0+)(1{3,})/');
p2=prxmatch(pid2,temp);
if p2 then do;
a = prxposn(pid2, 1, temp);
b = prxposn(pid2, 2, temp);
c = prxposn(pid2, 3, temp);
complete2=p2+length(a)+length(b)+2;
flags2 = length(c);
end;
drop pid1 pid2 p1 p2 a b c temp;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.