Hi there,
I'm struggling to come up with a solution to this problem and could use some input.
For each ID, I have a variable amount of date ranges. I am trying to create a single row for each ID that shows 1) the most recent end date (DateThru); and 2) the earliest start date (DateFrom) without a break greater than 1 day (using the most recent DateThru as an anchor).
Here is an idea of what my data looks like:
ID | DateFrom | DateThru |
123 | 10/1/2004 | 5/25/2006 |
123 | 5/26/2006 | 8/28/2007 |
123 | 8/29/2007 | 3/17/2008 |
123 | 3/18/2008 | 10/14/2008 |
123 | 12/7/2015 | 12/31/2015 |
123 | 2/12/2016 | 12/31/9999 |
In this example I would want to only output the last row. The second newest date range ended 12/31/15, so it ended greater than 1 day before 2/12/16. All other records are immaterial.
Here is another example:
ID | DateFrom | DateThru |
221 | 1/1/2005 | 5/26/2008 |
221 | 11/11/2011 | 8/1/2015 |
221 | 8/2/2015 | 10/12/2015 |
221 | 10/13/2015 | 6/30/2018 |
221 | 7/1/2018 | 12/31/9999 |
In this example I would want to see a single row for ID=221 that shows DateFrom = 11/11/2011 and DateThru = 12/31/9999. All but the first row show continuous date ranges with gaps of 1 day only.
Any thoughts?
@rlafond - and here is yet a third way that doesn't require any joining of tables:
data have;
input id datefrom :mmddyy10. datethru :mmddyy10.;
format date: date9.;
datalines;
123 10/1/2004 5/25/2006
123 5/26/2006 8/28/2007
123 8/29/2007 3/17/2008
123 3/18/2008 10/14/2008
123 12/7/2015 12/31/2015
123 2/12/2016 12/31/9999
221 1/1/2005 5/26/2008
221 11/11/2011 8/1/2015
221 8/2/2015 10/12/2015
221 10/13/2015 6/30/2018
221 7/1/2018 12/31/9999
;
run;
proc sort data = have;
by id datefrom;
run;
data want;
keep id datefrom_new datethru;
rename datefrom_new = datefrom;
set have;
by id;
retain datefrom_new datethru_last;
format datefrom_new date9.;
if first.id then do;
datethru_last = .;
datefrom_new = datefrom;
end;
else do;
if datefrom - datethru_last > 1 then datefrom_new = datefrom;
end;
if last.id then output;
datethru_last = datethru;
run;
What do you mean by "a break greater than one day"? Does that mean the DATEFROM in a record must start on the day immediately following DATETHRU in the preceding record? Or does it mean that DATEFROM can be the 2nd day after the prior DATETHRU, leaving exactly 1 day uncovered? I.e. what's a "break"?
Assuming you want no uncovered days, then you want to generate continuous time spans constructed from a sequence of records, and keep only the last of those time spans. I assume your data are sorted by ID DATEFROM, and that DATETHRU is always greater than or equal to DATEFROM:
data have;
input id datefrom :mmddyy10. datethru :mmddyy10.;
format date: date9.;
datalines;
123 10/1/2004 5/25/2006
123 5/26/2006 8/28/2007
123 8/29/2007 3/17/2008
123 3/18/2008 10/14/2008
123 12/7/2015 12/31/2015
123 2/12/2016 12/31/9999
221 1/1/2005 5/26/2008
221 11/11/2011 8/1/2015
221 8/2/2015 10/12/2015
221 10/13/2015 6/30/2018
221 7/1/2018 12/31/9999
run;
data want (drop=_:);
set have (keep=id);
by id;
merge have
have (firstobs=2 keep=datefrom rename=(datefrom=_nxt_from));
if first.id=1 or datefrom-1>lag(datethru) then _datefrom=datefrom;
else datefrom=_datefrom;
if last.id=1;
retain _datefrom;
run;
The task here is to detect whenever the incoming record is either the beginning of an id, or is more than 1 day after the preceding record. In that case store the current DATEFROM value into a retained variable (_DATEFROM), otherwise assign the retained _DATEFROM value into the current DATEFROM variable. No modifications are needed to DATETHRU. Then just keep the last incoming record, which would have its DATEFROM value properly updated, if needed.
Note if you actually will permit uncovered single days in successive records, just change
if first.id=1 or datefrom-1>lag(datethru) then ...
to
if first.id=1 or datefrom-2>lag(datethru) then ...
data have;
input ID (DateFrom DateThru) (:mmddyy10.);
format datefrom datethru mmddyy10.;
cards;
123 10/1/2004 5/25/2006
123 5/26/2006 8/28/2007
123 8/29/2007 3/17/2008
123 3/18/2008 10/14/2008
123 12/7/2015 12/31/2015
123 2/12/2016 12/31/9999
221 1/1/2005 5/26/2008
221 11/11/2011 8/1/2015
221 8/2/2015 10/12/2015
221 10/13/2015 6/30/2018
221 7/1/2018 12/31/9999
;
data temp;
set have;
by id;
if first.id then f=1;
If not first.id and DateFrom ne lag(DateThru)+1 then f+1;
run;
proc sql;
create table want as
select ID, min(datefrom) as Datefrom format=mmddyy10., max(DateThru) as Datethru format=mmddyy10.
from (select * from temp group by id having f=max(f))
group by id, f ;
quit;
@rlafond - and here is yet a third way that doesn't require any joining of tables:
data have;
input id datefrom :mmddyy10. datethru :mmddyy10.;
format date: date9.;
datalines;
123 10/1/2004 5/25/2006
123 5/26/2006 8/28/2007
123 8/29/2007 3/17/2008
123 3/18/2008 10/14/2008
123 12/7/2015 12/31/2015
123 2/12/2016 12/31/9999
221 1/1/2005 5/26/2008
221 11/11/2011 8/1/2015
221 8/2/2015 10/12/2015
221 10/13/2015 6/30/2018
221 7/1/2018 12/31/9999
;
run;
proc sort data = have;
by id datefrom;
run;
data want;
keep id datefrom_new datethru;
rename datefrom_new = datefrom;
set have;
by id;
retain datefrom_new datethru_last;
format datefrom_new date9.;
if first.id then do;
datethru_last = .;
datefrom_new = datefrom;
end;
else do;
if datefrom - datethru_last > 1 then datefrom_new = datefrom;
end;
if last.id then output;
datethru_last = datethru;
run;
data result;
set a;
by id datefrom;
retain nstart nend;
if first.id or datefrom^=nend+1 then nstart=datefrom;
nend=datethru;
if last.id then do;
datefrom=nstart;output;
end;
drop nstart nend;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.