I want to combine two or more time periods into one, contingent upon a) same ID and b) same value on another variable (hours).
Data have:
ID Start End Hours
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230110 20230131 8
2 20230201 20230228 4
Data want:
ID Start End Hours
1 20230101 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230110 20230131 8
2 20230201 20230228 4
PROC SUMMARY (alias PROC MEANS) is intended for exactly this type of result:
proc summary data=have noprint nway;
output out=want (drop=_type_ _freq_) min(start)=start max(end)=end;
class id hours;
run;
However, the output data will be sorted by ID/HOURS, not ID/START.
Edited addition: Please note, as per @Tom 's comment, that this program ignores gaps between individual date ranges having matching ID/HOURS. So the resulting min start and max end may encompass numerous widely separated individual date ranges.
Try this:
data Have;
input @1 ID $2. @4 Start yymmdd8. @14 End yymmdd8. @23 Hours 4.;
format start end yymmddn8.;
datalines;
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230110 20230131 8
2 20230201 20230228 4
;
run;
proc sql;
create table Want as
select ID
,Hours
,min(start) as start format = yymmddn8.
,max(end) as end format = yymmddn8.
from Have
group by id, Hours
order by id, start, end
;
quit;
Note, my result also collapses ID 2 rows with 8 hours.
PROC SUMMARY (alias PROC MEANS) is intended for exactly this type of result:
proc summary data=have noprint nway;
output out=want (drop=_type_ _freq_) min(start)=start max(end)=end;
class id hours;
run;
However, the output data will be sorted by ID/HOURS, not ID/START.
Edited addition: Please note, as per @Tom 's comment, that this program ignores gaps between individual date ranges having matching ID/HOURS. So the resulting min start and max end may encompass numerous widely separated individual date ranges.
I actually do care about gaps between the dates so record 4 and 5 should not collaps. Any suggestions?
How large of a GAP between START and the END of the previous interval to you want to accept without making a new interval?
If you do not care about the size of the gap you can use the NOTSORTED option of the BY statement to assign a group id to data that is grouped, but not actually sorted.
data have;
input ID (Start End) (:yymmdd.) Hours;
format start end yymmdd10.;
cards;
1 20230101 20230108 8
1 20230109 20230131 8
1 20230201 20230228 4
2 20230101 20230108 8
2 20230110 20230131 8
2 20230201 20230228 4
;
data grouped;
set have;
by id hours notsorted;
if first.id then group=0;
group+first.hours;
run;
proc summary data=grouped;
by id group hours;
output out=want(drop=_type_) min(start)= max(end)= ;
run;
Results
Obs ID group Hours _FREQ_ Start End 1 1 1 8 2 2023-01-01 2023-01-31 2 1 2 4 1 2023-02-01 2023-02-28 3 2 1 8 2 2023-01-01 2023-01-31 4 2 2 4 1 2023-02-01 2023-02-28
This was an important comment. In fact I'm interested in situastions were there are gaps between the periods, like in row 4 and 5. How is this implemented in the code you suggested?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.