I would like to combine two or more consecutive time periods into one, based on the condition that they have the same ID and the same value for another variable (hours). A new record in the output data will be generated if there is a new ID or a change in the value of hours within an ID. Additionally, if there is a break or a gap between the Enddate of one period and the Startdate of the next period within an ID, a new record will also be created in the output data.
Data Have:
Rec ID Startdate Enddate Hours
1 1 01/01/2023 01/08/2023 8
2 1 01/09/2023 01/31/2023 8
3 1 02/01/2023 02/28/2023 4
4 1 03/01/2023 05/31/2023 8
5 1 06/01/2023 09/26/2023 8
6 2 01/01/2023 01/07/2023 8
7 2 01/10/2023 01/31/2023 8
8 2 02/01/2023 02/28/2023 4
Data Want:
ID Startdate Enddate Hours
1 01/01/2023 31/01/2023 8
1 02/01/2023 02/28/2023 4
1 03/01/2023 09/26/2023 8
2 01/01/2023 01/07/2023 8
2 01/10/2023 01/31/2023 8
2 02/01/2023 02/28/2023 4
Based on a former discussion in SAS Communities I’m working on this code which doesn’t fully make it (for example, it doesn’t collapse record 4 and 5):
proc sort data=have out=have2;
by id hours startdate;
run;
data want;
set have2;
by id hours;
lag_enddate=lag(enddate);
if not first.hours then do;
if startdate-lag_enddate<=2 then delete_flag=1;
end;
if delete_flag then delete;
drop lag_enddate delete_flag;
run;
Assuming the data are sorted by id/startdate, then (untested in the absence of sample data in the form of a working DATA step):
data want (drop=_: nxt_:);
set have;
by id hours notsorted;
merge have have (firstobs=2 keep=startdate rename=(startdate=nxt_startdate));
retain _startdate;
if first.hours or lag(enddate)<startdate-1 then _startdate=startdate;
if last.hours or nxt_startdate > enddate+1;
startdate=_startdate;
run;
Are you really sure that these are supposed to "collapse"? The 5/31/2023 date is not included. Your rule actually states that if there is a gap they should not be collapsed but you ask specifically about records 4 and 5 that should.
4 1 03/01/2023 05/30/2023 8
5 1 06/01/2023 09/26/2023 8
Assuming the data are sorted by id/startdate, then (untested in the absence of sample data in the form of a working DATA step):
data want (drop=_: nxt_:);
set have;
by id hours notsorted;
merge have have (firstobs=2 keep=startdate rename=(startdate=nxt_startdate));
retain _startdate;
if first.hours or lag(enddate)<startdate-1 then _startdate=startdate;
if last.hours or nxt_startdate > enddate+1;
startdate=_startdate;
run;
Worked well! Thanks!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.