Solved: How can I retain unique date ranges only after collapsing any overlapp...

dominiquec · Posted 09-12-2023 12:54 PM

SAS learner here! I have a dataset with start and stop dates of eligibility; unique IDs in the data can repeat, with multiple start (eligeff) and stop (eligend) dates of eligibility. Within a unique ID, I am trying to 1) collapse any overlapping date ranges or 2) stitch together any date ranges that are a maximum of 1 day apart (i.e. one row has a stop date of 12.31.2019 and the next row has a start date of 01.01.2020). Any eligibility periods 2 days or more apart should remain as they are (i.e. one row has a stop date of 12.31.2019 and the next start date is 01.02.2020). Here is the code I've written so far:

proc sort data=amlgroup.mbr_0;
by patid eligeff eligend;
run;

data amlgroup.mbr_1;
set amlgroup.mbr_0;
by patid eligeff eligend;
retain maxenr_dt minenr_dt;
if first.patid then do;
	minenr_dt = eligeff; maxenr_dt = eligend;
	end;
else do;
if eligend <=maxenr_dt then delete;
if . < eligeff - maxenr_dt <= 1 and maxenr_dt < eligend then maxenr_dt = eligend;
if eligeff - maxenr_dt > 1 then do;
	minenr_dt = eligeff; maxenr_dt = eligend;
	end;
end;
format maxenr_dt minenr_dt yymmdd10.;
run;

The problem I have is when I grab minenr_dt and maxenr_dt for the next phase of my project, as I only want the final, collapsed date ranges (see example of resulting dataset below). I need to drop rows where the date ranges are nested within a larger date range for the same ID (see example, I need to drop the first row for ID 123456), but still be able to keep all non-overlapping date ranges (see example, I don't want to drop either of the rows for ID 789100). Any suggestions for how to change my code? Thank you.

ID	eligeff	eligend	minenr_dt	maxenr_dt
123456	01.01.2015	09.30.2018	01.01.2015	09.30.2018
123456	10.01.2018	03.25.2022	01.01.2015	03.25.2020
789100	02.01.2017	12.31.2018	02.01.2017	12.31.2018
789100	01.01.2020	06.30.2022	01.01.2020	06.30.2022

mkeintz · Posted 09-12-2023 08:32 PM

Using two dataset name parameters (firstobs= and rename= (and yes keep=, so 3 parameters) in a MERGE statement makes this very easy with a sorted dataset.

data splicer_date_ranges (drop=nxt_:);
  do until (nxt_id^=id or nxt_eligeff>sum(eligend,1));
    merge have
          have (firstobs=2 keep=id eligeff rename=(id=nxt_id eligeff=nxt_eligeff));
    minenr_dt=coalesce(minenr_dt,eligeff);
    maxenr_dt=eligend;
  end;
  format minenr_dt maxenr_dt date9. ;
run;

Edit note: corrected the AND to OR in the do until expression, (and changed the inequality test).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 09-12-2023 08:32 PM

Using two dataset name parameters (firstobs= and rename= (and yes keep=, so 3 parameters) in a MERGE statement makes this very easy with a sorted dataset.

data splicer_date_ranges (drop=nxt_:);
  do until (nxt_id^=id or nxt_eligeff>sum(eligend,1));
    merge have
          have (firstobs=2 keep=id eligeff rename=(id=nxt_id eligeff=nxt_eligeff));
    minenr_dt=coalesce(minenr_dt,eligeff);
    maxenr_dt=eligend;
  end;
  format minenr_dt maxenr_dt date9. ;
run;

Edit note: corrected the AND to OR in the do until expression, (and changed the inequality test).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

dominiquec · Posted 09-14-2023 11:09 AM

Thank you! I was able to apply the code you suggested successfully.

How can I retain unique date ranges only after collapsing any overlapping dates within a unique ID?

Re: How can I retain unique date ranges only after collapsing any overlapping dates within a unique

Re: How can I retain unique date ranges only after collapsing any overlapping dates within a unique

Re: How can I retain unique date ranges only after collapsing any overlapping dates within a unique

How can I retain unique date ranges only after collapsing any overlapping dates within a unique ID?

Re: How can I retain unique date ranges only after collapsing any overlapping dates within a unique

Re: How can I retain unique date ranges only after collapsing any overlapping dates within a unique

Re: How can I retain unique date ranges only after collapsing any overlapping dates within a unique

Click image to register for webinar

Classroom Training Available!