SAS learner here! I have a dataset with start and stop dates of eligibility; unique IDs in the data can repeat, with multiple start (eligeff) and stop (eligend) dates of eligibility. Within a unique ID, I am trying to 1) collapse any overlapping date ranges or 2) stitch together any date ranges that are a maximum of 1 day apart (i.e. one row has a stop date of 12.31.2019 and the next row has a start date of 01.01.2020). Any eligibility periods 2 days or more apart should remain as they are (i.e. one row has a stop date of 12.31.2019 and the next start date is 01.02.2020). Here is the code I've written so far:
proc sort data=amlgroup.mbr_0;
by patid eligeff eligend;
run;
data amlgroup.mbr_1;
set amlgroup.mbr_0;
by patid eligeff eligend;
retain maxenr_dt minenr_dt;
if first.patid then do;
minenr_dt = eligeff; maxenr_dt = eligend;
end;
else do;
if eligend <=maxenr_dt then delete;
if . < eligeff - maxenr_dt <= 1 and maxenr_dt < eligend then maxenr_dt = eligend;
if eligeff - maxenr_dt > 1 then do;
minenr_dt = eligeff; maxenr_dt = eligend;
end;
end;
format maxenr_dt minenr_dt yymmdd10.;
run;
The problem I have is when I grab minenr_dt and maxenr_dt for the next phase of my project, as I only want the final, collapsed date ranges (see example of resulting dataset below). I need to drop rows where the date ranges are nested within a larger date range for the same ID (see example, I need to drop the first row for ID 123456), but still be able to keep all non-overlapping date ranges (see example, I don't want to drop either of the rows for ID 789100). Any suggestions for how to change my code? Thank you.
ID | eligeff | eligend | minenr_dt | maxenr_dt |
123456 | 01.01.2015 | 09.30.2018 | 01.01.2015 | 09.30.2018 |
123456 | 10.01.2018 | 03.25.2022 | 01.01.2015 | 03.25.2020 |
789100 | 02.01.2017 | 12.31.2018 | 02.01.2017 | 12.31.2018 |
789100 | 01.01.2020 | 06.30.2022 | 01.01.2020 | 06.30.2022 |
Using two dataset name parameters (firstobs= and rename= (and yes keep=, so 3 parameters) in a MERGE statement makes this very easy with a sorted dataset.
data splicer_date_ranges (drop=nxt_:);
do until (nxt_id^=id or nxt_eligeff>sum(eligend,1));
merge have
have (firstobs=2 keep=id eligeff rename=(id=nxt_id eligeff=nxt_eligeff));
minenr_dt=coalesce(minenr_dt,eligeff);
maxenr_dt=eligend;
end;
format minenr_dt maxenr_dt date9. ;
run;
Edit note: corrected the AND to OR in the do until expression, (and changed the inequality test).
Using two dataset name parameters (firstobs= and rename= (and yes keep=, so 3 parameters) in a MERGE statement makes this very easy with a sorted dataset.
data splicer_date_ranges (drop=nxt_:);
do until (nxt_id^=id or nxt_eligeff>sum(eligend,1));
merge have
have (firstobs=2 keep=id eligeff rename=(id=nxt_id eligeff=nxt_eligeff));
minenr_dt=coalesce(minenr_dt,eligeff);
maxenr_dt=eligend;
end;
format minenr_dt maxenr_dt date9. ;
run;
Edit note: corrected the AND to OR in the do until expression, (and changed the inequality test).
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.