Hello,
I have monthly enrollment data from commercial and Medicare claims and would like to roll them up into continuous enrollment segments. Tricky part is that an individual can have overlap coverage (commercial+Medicare). I tried to create a date variable (i.e. 30 lines if monthly enrollment is Sep01 to Sep30) and check if dif(date)>1 for triggering discontinuation of current enrollment. However, my data set gets extremely large (300B obs) and run time is several hours. I would appreciate if anyone could provide a much efficient approach to tackle this. An example of data input and desired output are shown below.
Data input
PatientID Start End Coverage
00001 01/01/2018 01/31/2018 Commercial
00001 02/01/2018 02/28/2018 Commercial
00001 04/01/2018 04/30/2018 Commercial
00001 05/01/2018 05/20/2018 Commercial
00001 05/15/2018 05/31/2018 Medicare
Output
PatientID Start End
00001 01/01/2018 02/28/2018
00001 04/01/2018 05/31/2018
data have;
input PatientID Start : mmddyy10. End : mmddyy10. ;
format Start End mmddyy10.;
cards;
00001 01/01/2018 01/31/2018 Commercial
00001 02/01/2018 02/28/2018 Commercial
00001 04/01/2018 04/30/2018 Commercial
00001 05/01/2018 05/20/2018 Commercial
00001 05/15/2018 05/31/2018 Medicare
;
data temp;
set have;
by PatientID ;
dif=start-lag(end);
if first.PatientID then dif=.;
run;
data temp;
set temp;
by PatientID ;
if dif >1 or first.PatientID then group+1;
run;
data want;
set temp;
by PatientID group;
retain new_start;
if first.group then new_start=start;
if last.group then do;new_end=end;output;end;
keep PatientID new_start new_end;
format new_start new_end mmddyy10.;
run;
proc print;run;
You need to show at least what happens with the Coverage variable values in this "roll up". Are there any other variables? What happens with them in the roll up?
I just put them there for information purpose. It does not matter what's the value of the Coverage variable. It just shows that there could be some records with overlapping time period.
Any help is appreciated!
data have;
input PatientID Start : mmddyy10. End : mmddyy10. ;
format Start End mmddyy10.;
cards;
00001 01/01/2018 01/31/2018 Commercial
00001 02/01/2018 02/28/2018 Commercial
00001 04/01/2018 04/30/2018 Commercial
00001 05/01/2018 05/20/2018 Commercial
00001 05/15/2018 05/31/2018 Medicare
;
data temp;
set have;
by PatientID ;
dif=start-lag(end);
if first.PatientID then dif=.;
run;
data temp;
set temp;
by PatientID ;
if dif >1 or first.PatientID then group+1;
run;
data want;
set temp;
by PatientID group;
retain new_start;
if first.group then new_start=start;
if last.group then do;new_end=end;output;end;
keep PatientID new_start new_end;
format new_start new_end mmddyy10.;
run;
proc print;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.