Hi SAS community,
I would like to identify the individuals who have been continuously enrolled beginning from their start date to either end of study (12/31/2015) or date of event. It is possible for a subject to be enrolled prior to the start date. However, I do not want to count the months before the listed start date .Please advise on suggested coding tips. Below is a simplified version of the dataset. Using this example, the Individuals who meet the criteria, include AA, BB, EE.
data have;
input SubID startdate eventdate enddate mon1 mon2 mon3 mon4 mon5 mon6 mon7 mon8 mon9 mon10 mon11 mon12;
datalines;
AA 03/10/2015 . 12/31/2015 . . 1 1 1 1 1 1 1 1 1 1
BB 05/20/2015 08/02/2015 12/31/2015 . . . 1 1 1 1 1 1 1 1 1
CC 01/18/2015 . 12/31/2015 1 1 1 1 . . . . . 1 . .
DD 08/01/2015 . 12/31/2015 . . . . . . 1 1 . 1
EE 02/25/2015 . 05/11/2015 12/31/2015 . 1 1 1 1 1 1 1 1 . . 1;
FF 04/09/2015 11/10/2015 12/31/2015 . 1 1 1 1 . 1 1 1 1 1 .
;;
Thanks!
I think I have a solution for you, let me know if this does it, first I got the applicable range, then made sure that your mon1-12 variable within that range was populated:
data have;
infile datalines ;
informat subid $2. startdate eventdate enddate mmddyy10.;;
format subid $2. startdate eventdate enddate mmddyy10.;
input SubID$ startdate eventdate enddate mon1 mon2 mon3 mon4 mon5 mon6 mon7 mon8 mon9 mon10 mon11 mon12;
datalines;
AA 03/10/2015 . 12/31/2015 . . 1 1 1 1 1 1 1 1 1 1
BB 05/20/2015 08/02/2015 12/31/2015 . . . 1 1 1 1 1 1 1 1 1
CC 01/18/2015 . 12/31/2015 1 1 1 1 . . . . . 1 . .
DD 08/01/2015 . 12/31/2015 . . . . . . 1 1 . 1 . .
EE 02/25/2015 05/11/2015 12/31/2015 . 1 1 1 1 1 1 1 1 . . 1
FF 04/09/2015 11/10/2015 12/31/2015 . 1 1 1 1 . 1 1 1 1 1 .
;
data range;
set have;
start = month(startdate);
if not missing(eventdate) then end = month(eventdate);
else end = month(enddate);
run;
proc transpose data=range out=tran_range;by subid start end;var mon:;
data compress;
set tran_range;
where start <= input(compress(_NAME_,'a','l'),8.) <= end;
if missing(col1) then col1 = 0;
run;
proc sql;
create table want as
select distinct subid,'Yes' as Eligible,end - start + 1 as sum_months
from compress
group by subid
having min(col1) = 1;
data final;
merge want(in=a)
have(in=b);
by subid;
if b;
if b and not a then Eligible = 'No';
run;
Message was edited by: Mark Johnson
What do mon1 mon2 represent, how are they derived?
How do you determine/define continuous enrollment?
Also provide an example of what you want the output to look like.
Thanks Reeza and Ballardw for your questions. See my responses to your questions. Let me know if I should clarify further! Thank you!
1. What do mon1 mon2 represent, how are they derived?
mon1, mon2, . . . mon12 = the months in 2013 where mon1 = Jan 2013, mon2 = Feb 2013 ... mon12=Dec 2013. 1 = enrolled; . = not enrolled.
For example, subject AA was enrolled from mon3-mon12 or from March 2013 to Dec 2013.
2. How do you determine/define continuous enrollment?
Continuous enrollment is define by enrollment (i.e. value=1) for at least 1 day in every month from start date to end date or date of event. For example, Subject AA was continuously enrolled for the defined period because there is a 1 from Mon3-Mon12 and AA start date is 3/10/2015 (Mon3) and end date is 12/31/2015 (Mon12). In contrast, Subject CC was not continuously enrolled for the desired time frame because CC had start date and end dates from Jan 2013-Dec 2013 and was not enrolled in mon5-mon9, and mon11-mon12.
3. Provide an example of what you want the output to look like.
Ideally, the output will contain at least 2 new variables: eligible (i.e. met the continuously enrollment criteria) and if yes, the sum of months from start to end or start to event. For example, see the same dataset below with two new variables.
data want;
input SubID startdate eventdate enddate mon1 mon2 mon3 mon4 mon5 mon6 mon7 mon8 mon9 mon10 mon11 mon12 eligible sum_months;
datalines;
AA 03/10/2015 . 12/31/2015 . . 1 1 1 1 1 1 1 1 1 1 yes 10
BB 05/20/2015 08/02/2015 12/31/2015 . . . 1 1 1 1 1 1 1 1 1 yes 4
CC 01/18/2015 . 12/31/2015 1 1 1 1 . . . . . 1 . . no .
DD 08/01/2015 . 12/31/2015 . . . . . . 1 1 . 1 no .
EE 02/25/2015 . 05/11/2015 12/31/2015 . 1 1 1 1 1 1 1 1 . . 1 yes 4
FF 04/09/2015 11/10/2015 12/31/2015 . 1 1 1 1 . 1 1 1 1 1 . no .
;;
I think I have a solution for you, let me know if this does it, first I got the applicable range, then made sure that your mon1-12 variable within that range was populated:
data have;
infile datalines ;
informat subid $2. startdate eventdate enddate mmddyy10.;;
format subid $2. startdate eventdate enddate mmddyy10.;
input SubID$ startdate eventdate enddate mon1 mon2 mon3 mon4 mon5 mon6 mon7 mon8 mon9 mon10 mon11 mon12;
datalines;
AA 03/10/2015 . 12/31/2015 . . 1 1 1 1 1 1 1 1 1 1
BB 05/20/2015 08/02/2015 12/31/2015 . . . 1 1 1 1 1 1 1 1 1
CC 01/18/2015 . 12/31/2015 1 1 1 1 . . . . . 1 . .
DD 08/01/2015 . 12/31/2015 . . . . . . 1 1 . 1 . .
EE 02/25/2015 05/11/2015 12/31/2015 . 1 1 1 1 1 1 1 1 . . 1
FF 04/09/2015 11/10/2015 12/31/2015 . 1 1 1 1 . 1 1 1 1 1 .
;
data range;
set have;
start = month(startdate);
if not missing(eventdate) then end = month(eventdate);
else end = month(enddate);
run;
proc transpose data=range out=tran_range;by subid start end;var mon:;
data compress;
set tran_range;
where start <= input(compress(_NAME_,'a','l'),8.) <= end;
if missing(col1) then col1 = 0;
run;
proc sql;
create table want as
select distinct subid,'Yes' as Eligible,end - start + 1 as sum_months
from compress
group by subid
having min(col1) = 1;
data final;
merge want(in=a)
have(in=b);
by subid;
if b;
if b and not a then Eligible = 'No';
run;
Message was edited by: Mark Johnson
Hi Mark!
Your solution works great. Thank you for your time!
One other follow-up question. I actually have multiple years of data that I'm working with so my start and end dates frequently have different years. For example, subject MM start date= 06/05/2013 and end date = 12/31/2015. Do you have a suggestion on how to work with those differences?
Thanks again!
What do your variables look like, var1 var2?
I'm sorry for not mentioning the multiple years of data. I was trying to simplify things . . .
As for your question, the only difference in my variables is how the months are labeled. There are actually 36 months with Mon1=Jan 2013 and Mon36=Dec2015 everything else is the same.
Hi Mark! I figured out a solution! Thanks again!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.