I am dealing with health care enrollment data right now, facing the following problem.
Raw data looks like
ID stard_date end_date
P1 A B
P1 B+1 C
P1 C+180 D
P2 E F
P2 F+1 G
P3 H I
The expected output is like
ID stard_date end_date
P1 A C
P1 C+180 D
P2 E G
P3 H I
The criteria of difining consecutive coverage or not could be parameterized, for example, in above example, if there is only 1 day gap, we consider it as consecutive, if the gap is large enough( 180 days), we don't concatenate them( see P1).
In addition, in order to make the ID as unique key, I am thinking about to change the ID as "ID+start date"
ID stard_date end_date
P1+A A C
P1 +C+180 C+180 D
P2 +E E G
P3 +H H I
Thanks in advance
Should be good if you post some real date value rather than character .
data have;
input ID $ stard_date : date9. end_date : date9.;
format stard_date end_date date9.;
cards;
P1 01jan2012 01jun2012
P1 02jun2012 01jun2013
P1 01jun2014 01dec2014
P2 01jun2012 01dec2012
P2 02dec2012 01jun2013
P2 01jun2014 01jun2015
;
run;
data temp;
set have;
n+1;
temp=stard_date; output;
temp=end_date; output;
drop stard_date end_date;
format temp date9.;
run;
data temp;
set temp;
by id n;
if first.id or (first.n and temp gt lag(temp)+1) then group+1;
run;
data x;
set temp;
by group;
if first.group or last.group;
run;
proc transpose data=x out=want(drop=_name_);
by id group;
var temp;
run;
(ID + start date) is the unique key.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.