I want to combine two or more time periods into one, contingent upon a) same ID and b) same value on another variable (hours).
Data have:
ID Start End Hours
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230110 20230131 8
2 20230201 20230228 4
Data want:
ID Start End Hours
1 20230101 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230110 20230131 8
2 20230201 20230228 4
PROC SUMMARY (alias PROC MEANS) is intended for exactly this type of result:
proc summary data=have noprint nway;
output out=want (drop=_type_ _freq_) min(start)=start max(end)=end;
class id hours;
run;
However, the output data will be sorted by ID/HOURS, not ID/START.
Edited addition: Please note, as per @Tom 's comment, that this program ignores gaps between individual date ranges having matching ID/HOURS. So the resulting min start and max end may encompass numerous widely separated individual date ranges.
Try this:
data Have;
input @1 ID $2. @4 Start yymmdd8. @14 End yymmdd8. @23 Hours 4.;
format start end yymmddn8.;
datalines;
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230110 20230131 8
2 20230201 20230228 4
;
run;
proc sql;
create table Want as
select ID
,Hours
,min(start) as start format = yymmddn8.
,max(end) as end format = yymmddn8.
from Have
group by id, Hours
order by id, start, end
;
quit;
Note, my result also collapses ID 2 rows with 8 hours.
PROC SUMMARY (alias PROC MEANS) is intended for exactly this type of result:
proc summary data=have noprint nway;
output out=want (drop=_type_ _freq_) min(start)=start max(end)=end;
class id hours;
run;
However, the output data will be sorted by ID/HOURS, not ID/START.
Edited addition: Please note, as per @Tom 's comment, that this program ignores gaps between individual date ranges having matching ID/HOURS. So the resulting min start and max end may encompass numerous widely separated individual date ranges.
I actually do care about gaps between the dates so record 4 and 5 should not collaps. Any suggestions?
How large of a GAP between START and the END of the previous interval to you want to accept without making a new interval?
If you do not care about the size of the gap you can use the NOTSORTED option of the BY statement to assign a group id to data that is grouped, but not actually sorted.
data have;
input ID (Start End) (:yymmdd.) Hours;
format start end yymmdd10.;
cards;
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230110 20230131 8
2 20230201 20230228 4
;
data grouped;
set have;
by id hours notsorted;
if first.id then group=0;
group+first.hours;
run;
proc summary data=grouped;
by id group hours;
output out=want(drop=_type_) min(start)= max(end)= ;
run;
Results
Obs ID group Hours _FREQ_ Start End 1 1 1 8 2 2023-01-01 2023-01-31 2 1 2 4 1 2023-02-01 2023-02-28 3 2 1 8 2 2023-01-01 2023-01-31 4 2 2 4 1 2023-02-01 2023-02-28
This was an important comment. In fact I'm interested in situastions were there are gaps between the periods, like in row 4 and 5. How is this implemented in the code you suggested?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.