I want to combine two or more time periods into one, contingent upon a) same ID, b) same value on another variable (hours) and gaps between time periode. If there is a gap between two periods of more than two days, then this should still be two periods (cf. ID2, first and second row):
Data have:
ID Startdate Enddate Hours
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230112 20230131 8
2 20230201 20230228 4
Data want:
ID Startdate Enddate Hours
1 20230101 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230112 20230131 8
2 20230201 20230228 4
You can use a lag() function to compute the gap between last row and current row.
data have;
informat ID $8. Startdate Enddate b8601da8.;
format Startdate Enddate b8601da8.;
input ID $ Startdate Enddate Hours;
cards;
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230112 20230131 8
2 20230201 20230228 4
;
run;
proc sort data=have out=have2;
by id hours startdate;
run;
data want;
set have2;
by id hours;
lag_enddate=lag(enddate);
if not first.hours then do;
if startdate-lag_enddate<=2 then delete_flag=1;
end;
if delete_flag then delete;
drop lag_enddate delete_flag;
run;
Before starting, are your start and end date values actually SAS date values, i.e. numeric type with a format like (apparently) yymmddn8.?
You can use a lag() function to compute the gap between last row and current row.
data have;
informat ID $8. Startdate Enddate b8601da8.;
format Startdate Enddate b8601da8.;
input ID $ Startdate Enddate Hours;
cards;
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230112 20230131 8
2 20230201 20230228 4
;
run;
proc sort data=have out=have2;
by id hours startdate;
run;
data want;
set have2;
by id hours;
lag_enddate=lag(enddate);
if not first.hours then do;
if startdate-lag_enddate<=2 then delete_flag=1;
end;
if delete_flag then delete;
drop lag_enddate delete_flag;
run;
This works fine. Let's say we don't have the hours variable. How would that change the code?
The solution below is not correct, so there we only delete the row that are extended in regard that sum the duration of ID 1. Can someone explain what the correct solution is?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.