Hello SAS Friends,
I am trying to create flags to identify if a person was continuously enrolled in a health plan. Each person has multiple beginning and end enrollment dates through the course of the 18 years of data. I would like to create 18 variables (0/1) with value=1 if the person was enrolled 11 of 12 months of that year. Here is a sample of the data:
data test;
input id$2. effdate enddate ;
attrib start format =MMYYP. informat=MMYYP.;
attrib end format =MMYYP. informat=MMYYP.;
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
And here is the output that I want:
PID effdate enddate flag2000 flag2001 flag2002 … flag2010… flag2014... flag2017
1 09.1996 07.2000 0 0 0 0 0 0
1 10.2000 10.2010 0 1 1 0 0 0
1 12.2014 12.2017 0 0 0 0 0 1
2 11.2012 12.2014 0 0 0 0 1 0
2 02.2015 08.2017 0 0 0 0 0 0
3 02.2000 07.2014 0 1 1 1 0 0
3 08.2014 05.2018 0 0 0 0 1 1
Thank you for your help!
data test;
input id $ effdate $ enddate $ ;
start=input(cats('01.',effdate),ddmmyy10.);
end=input(cats('01.',enddate),ddmmyy10.);
format start end ddmmyyp10.;
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
data temp;
set test;
do i=0 to intck('month',start,end);
date=intnx('month',start,i);output;
end;
format date ddmmyyp10.;
drop i start end;
run;
proc freq data=temp noprint;
table id*effdate*enddate*date/out=temp1;
format date year4.;
run;
data temp2;
set temp1;
flag=(count>10);
drop percent;
run;
proc transpose data=temp2 out=temp3 prefix=flag;
by id effdate enddate;
var flag;
id date;
run;
proc stdize data=temp3 out=want missing=0 reponly;
var _numeric_;
run;
Edit your post as a self-contained, syntactically correct data step using datalines, then perhaps I'll try harder.
I'm kind of over (the MANY) cut-and-paste posts, where I'm meant to convert the post into a working SAS data step.
And put your code in a SAS code block (the little "running man") so I can scroll the code rather than having it wrap.
sorry @ScottBass, I've always hand-typed it in. Working on it now...
While I stand by my message in my first post...I really do wish folks would post a self-contained data step so I don't have to do the conversion...perhaps I could have been less "terse" in my tone. So sorry bout that 😉
Anyway, perhaps this will get you going. And notice I didn't attempt to convert your data into a working data step 😉
It's not complete, and you'll need to fiddle with it. But I think the heavy lifting is there? You'll also need to test it.
data have;
id=1;
start="01APR2012"d;
end="01SEP2018"d;
format start end yymmddd10.;
run;
data want;
set have;
by id start;
length flag2000-flag2020 8;
array flags{2000:2020} flag:;
if first.id then do;
do i=lbound(flags) to hbound(flags);
flags{i}=0;
end;
counter=0;
end;
date=start;
do until (date > end);
if month(date)=1 then counter=0;
counter+1;
if month(date)=12 then flags{year(date)}=(counter ge 11);
date=intnx('month',date,1);
end;
* drop i counter date;
run;
In summary:
Hope this helps...
Hi @ScottBass. Thank you for this solution. When I run the command, there is an error message that says: ERROR: Array subscript out of range at line 196 column 30. The line and column numbers keep changing when I rerun the command. For example, ERROR: Array subscript out of range at line 216 column 30. The flags and the other three variables -- i counter date -- are all in the output dataset, but there are no observations.
Could the issue be that my date is month/year (since day is not in the data)?
Thank you for your assistance!
Could the issue be that my date is month/year (since day is not in the data)?
You don't have a date, you have a character string. Yes, you'll need to convert your character string to a SAS date.
Please try
data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
proc sql;
select min(start), max(end) into: min , : max from test;
quit;
data want;
set test;
array yrs(*) flag1996 flag1997 flag1998 flag1999 flag2000 flag2001 flag2002 flag2003 flag2004 flag2005 flag2006 flag2007 flag2008 flag2009 flag2011 flag2012 flag2013 flag2014 flag2015 flag2016 flag2017 flag2018;
array yrss(22) _temporary_ (1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018);
do i = 1 to dim(yrs);
if start<yrss(i)<end then yrs(i)=1;
end;
run;
Hi @Jagadishkatam . This worked except for observations #3 and #4:
1 12.2014 12.2017
2 11.2012 12.2014
For these observations, flag2017 (for obs #3) and flag2014 (for obs #4) are missing, not 1. It's not registering enddate of month 12 as complete enrollment for that year. Any suggestions?
Thanks!
Please try
data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
startm=input(scan(effdate,1,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
endm=input(scan(enddate,1,'.'),best.);
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
proc sql;
select min(start), max(end) into: min , : max from test;
quit;
data want;
set test;
array yrs(*) flag1996 flag1997 flag1998 flag1999 flag2000 flag2001 flag2002 flag2003 flag2004 flag2005 flag2006 flag2007 flag2008 flag2009 flag2011 flag2012 flag2013 flag2014 flag2015 flag2016 flag2017 flag2018;
array yrss(22) _temporary_ (1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018);
do i = 1 to dim(yrs);
if start<=yrss(i)<=end and (startm>=12 or endm>=12) then yrs(i)=1;
else if start<yrss(i)<end and (startm<12 or endm<12) then yrs(i)=1;
end;
run;
Hi @Jagadishkatam. I'm sorry, just tried your solution on the entire dataset. There are two more issues. Here another sample dataset:
data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
startm=input(scan(effdate,1,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
endm=input(scan(enddate,1,'.'),best.);
datalines;
1 01.1996 07.2000
1 08.2010 12.2012
1 12.2014 01.2017
2 12.2008 09.2013
2 07.2015 02.2018
;
run;
Issues:
1) The code isn't picking up full enrollment for the year if enrollment begins January.
See observation #1 - flag1996 should equal 1.
2) When the start or end dates have month 12, then the program doesn't work correctly.
See observation #2 - flag2010 should be missing
Observation #3 - flags 2014 and 2017 should be missing
Observation #4 - flags 2008 and 2013 should be missing
Everything works fine on observations (like obs #5) where start and/or end are not in month 12 and start not in month 1. Do you know how to solve this? Thank you so much!
data test;
input id $ effdate $ enddate $ ;
start=input(cats('01.',effdate),ddmmyy10.);
end=input(cats('01.',enddate),ddmmyy10.);
format start end ddmmyyp10.;
datalines;
1 09.1996 07.2000
1 10.2000 10.2010
1 12.2014 12.2017
2 11.2012 12.2014
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018
;
run;
data temp;
set test;
do i=0 to intck('month',start,end);
date=intnx('month',start,i);output;
end;
format date ddmmyyp10.;
drop i start end;
run;
proc freq data=temp noprint;
table id*effdate*enddate*date/out=temp1;
format date year4.;
run;
data temp2;
set temp1;
flag=(count>10);
drop percent;
run;
proc transpose data=temp2 out=temp3 prefix=flag;
by id effdate enddate;
var flag;
id date;
run;
proc stdize data=temp3 out=want missing=0 reponly;
var _numeric_;
run;
Thank you @Ksharp and all!
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.