Hello SAS Friends,
I am trying to create flags to identify if a person was continuously enrolled in a health plan. Each person has multiple beginning and end enrollment dates through the course of the 18 years of data. I would like to create 18 variables (0/1) with value=1 if the person was enrolled 11 of 12 months of that year. Here is a sample of the data:
data test;
input id$2. effdate enddate ;
attrib start format =MMYYP. informat=MMYYP.;
attrib end format =MMYYP. informat=MMYYP.;
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
And here is the output that I want:
PID effdate enddate flag2000 flag2001 flag2002 … flag2010… flag2014... flag2017
1 09.1996 07.2000 0 0 0 0 0 0
1 10.2000 10.2010 0 1 1 0 0 0
1 12.2014 12.2017 0 0 0 0 0 1
2 11.2012 12.2014 0 0 0 0 1 0
2 02.2015 08.2017 0 0 0 0 0 0
3 02.2000 07.2014 0 1 1 1 0 0
3 08.2014 05.2018 0 0 0 0 1 1
Thank you for your help!
data test;
input id $ effdate $ enddate $ ;
start=input(cats('01.',effdate),ddmmyy10.);
end=input(cats('01.',enddate),ddmmyy10.);
format start end ddmmyyp10.;
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
data temp;
set test;
do i=0 to intck('month',start,end);
date=intnx('month',start,i);output;
end;
format date ddmmyyp10.;
drop i start end;
run;
proc freq data=temp noprint;
table id*effdate*enddate*date/out=temp1;
format date year4.;
run;
data temp2;
set temp1;
flag=(count>10);
drop percent;
run;
proc transpose data=temp2 out=temp3 prefix=flag;
by id effdate enddate;
var flag;
id date;
run;
proc stdize data=temp3 out=want missing=0 reponly;
var _numeric_;
run;
Edit your post as a self-contained, syntactically correct data step using datalines, then perhaps I'll try harder.
I'm kind of over (the MANY) cut-and-paste posts, where I'm meant to convert the post into a working SAS data step.
And put your code in a SAS code block (the little "running man") so I can scroll the code rather than having it wrap.
sorry @ScottBass, I've always hand-typed it in. Working on it now...
While I stand by my message in my first post...I really do wish folks would post a self-contained data step so I don't have to do the conversion...perhaps I could have been less "terse" in my tone. So sorry bout that 😉
Anyway, perhaps this will get you going. And notice I didn't attempt to convert your data into a working data step 😉
It's not complete, and you'll need to fiddle with it. But I think the heavy lifting is there? You'll also need to test it.
data have;
id=1;
start="01APR2012"d;
end="01SEP2018"d;
format start end yymmddd10.;
run;
data want;
set have;
by id start;
length flag2000-flag2020 8;
array flags{2000:2020} flag:;
if first.id then do;
do i=lbound(flags) to hbound(flags);
flags{i}=0;
end;
counter=0;
end;
date=start;
do until (date > end);
if month(date)=1 then counter=0;
counter+1;
if month(date)=12 then flags{year(date)}=(counter ge 11);
date=intnx('month',date,1);
end;
* drop i counter date;
run;
In summary:
Hope this helps...
Hi @ScottBass. Thank you for this solution. When I run the command, there is an error message that says: ERROR: Array subscript out of range at line 196 column 30. The line and column numbers keep changing when I rerun the command. For example, ERROR: Array subscript out of range at line 216 column 30. The flags and the other three variables -- i counter date -- are all in the output dataset, but there are no observations.
Could the issue be that my date is month/year (since day is not in the data)?
Thank you for your assistance!
Could the issue be that my date is month/year (since day is not in the data)?
You don't have a date, you have a character string. Yes, you'll need to convert your character string to a SAS date.
Please try
data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
proc sql;
select min(start), max(end) into: min , : max from test;
quit;
data want;
set test;
array yrs(*) flag1996 flag1997 flag1998 flag1999 flag2000 flag2001 flag2002 flag2003 flag2004 flag2005 flag2006 flag2007 flag2008 flag2009 flag2011 flag2012 flag2013 flag2014 flag2015 flag2016 flag2017 flag2018;
array yrss(22) _temporary_ (1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018);
do i = 1 to dim(yrs);
if start<yrss(i)<end then yrs(i)=1;
end;
run;
Hi @Jagadishkatam . This worked except for observations #3 and #4:
1 12.2014 12.2017
2 11.2012 12.2014
For these observations, flag2017 (for obs #3) and flag2014 (for obs #4) are missing, not 1. It's not registering enddate of month 12 as complete enrollment for that year. Any suggestions?
Thanks!
Please try
data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
startm=input(scan(effdate,1,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
endm=input(scan(enddate,1,'.'),best.);
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
proc sql;
select min(start), max(end) into: min , : max from test;
quit;
data want;
set test;
array yrs(*) flag1996 flag1997 flag1998 flag1999 flag2000 flag2001 flag2002 flag2003 flag2004 flag2005 flag2006 flag2007 flag2008 flag2009 flag2011 flag2012 flag2013 flag2014 flag2015 flag2016 flag2017 flag2018;
array yrss(22) _temporary_ (1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018);
do i = 1 to dim(yrs);
if start<=yrss(i)<=end and (startm>=12 or endm>=12) then yrs(i)=1;
else if start<yrss(i)<end and (startm<12 or endm<12) then yrs(i)=1;
end;
run;
Hi @Jagadishkatam. I'm sorry, just tried your solution on the entire dataset. There are two more issues. Here another sample dataset:
data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
startm=input(scan(effdate,1,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
endm=input(scan(enddate,1,'.'),best.);
datalines;
1 01.1996 07.2000
1 08.2010 12.2012
1 12.2014 01.2017
2 12.2008 09.2013
2 07.2015 02.2018
;
run;
Issues:
1) The code isn't picking up full enrollment for the year if enrollment begins January.
See observation #1 - flag1996 should equal 1.
2) When the start or end dates have month 12, then the program doesn't work correctly.
See observation #2 - flag2010 should be missing
Observation #3 - flags 2014 and 2017 should be missing
Observation #4 - flags 2008 and 2013 should be missing
Everything works fine on observations (like obs #5) where start and/or end are not in month 12 and start not in month 1. Do you know how to solve this? Thank you so much!
data test;
input id $ effdate $ enddate $ ;
start=input(cats('01.',effdate),ddmmyy10.);
end=input(cats('01.',enddate),ddmmyy10.);
format start end ddmmyyp10.;
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
data temp;
set test;
do i=0 to intck('month',start,end);
date=intnx('month',start,i);output;
end;
format date ddmmyyp10.;
drop i start end;
run;
proc freq data=temp noprint;
table id*effdate*enddate*date/out=temp1;
format date year4.;
run;
data temp2;
set temp1;
flag=(count>10);
drop percent;
run;
proc transpose data=temp2 out=temp3 prefix=flag;
by id effdate enddate;
var flag;
id date;
run;
proc stdize data=temp3 out=want missing=0 reponly;
var _numeric_;
run;
Thank you @Ksharp and all!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.