BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
terjeph
Obsidian | Level 7

I want to combine two or more time periods into one, contingent upon a) same ID and b) same value on another variable (hours).

 

Data have:

ID                           Start                      End                        Hours

1                             20230101             20230108             8

1                             20230109             20230131             8

1                             20230201             20230228             4

2                             20230101             20230108             8

2                             20230110             20230131             8

2                             20230201             20230228             4

 

Data want:

ID                           Start                      End                        Hours

1                             20230101             20230131             8

1                             20230201             20230228             4

2                             20230101             20230108             8

2                             20230110             20230131             8

2                             20230201             20230228             4

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

PROC SUMMARY (alias PROC MEANS) is intended for exactly this type of result:

 

proc summary data=have noprint nway;
  output out=want (drop=_type_ _freq_)  min(start)=start max(end)=end;
  class id hours;
run;

However, the output data will be sorted by ID/HOURS, not ID/START.

 

 

Edited addition: Please note, as per @Tom 's comment, that this program ignores gaps between individual date ranges having matching ID/HOURS.  So the resulting min start and max end may encompass numerous widely separated individual date ranges.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
SASKiwi
PROC Star

Try this:

data Have;
input @1 ID $2.  @4 Start yymmdd8.   @14 End yymmdd8. @23 Hours 4.;
format start end yymmddn8.;
datalines;
1  20230101  20230108  8
1  20230109  20230131  8
1  20230201  20230228  4
2  20230101  20230108  8
2  20230110  20230131  8
2  20230201  20230228  4
;
run;

proc sql;
  create table Want as
  select  ID
         ,Hours
		 ,min(start) as start format = yymmddn8.
		 ,max(end) as end format = yymmddn8.
  from Have
  group by id, Hours
  order by id, start, end 
  ;
quit;

Note, my result also collapses ID 2 rows with 8 hours.

mkeintz
PROC Star

PROC SUMMARY (alias PROC MEANS) is intended for exactly this type of result:

 

proc summary data=have noprint nway;
  output out=want (drop=_type_ _freq_)  min(start)=start max(end)=end;
  class id hours;
run;

However, the output data will be sorted by ID/HOURS, not ID/START.

 

 

Edited addition: Please note, as per @Tom 's comment, that this program ignores gaps between individual date ranges having matching ID/HOURS.  So the resulting min start and max end may encompass numerous widely separated individual date ranges.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
terjeph
Obsidian | Level 7
Thanks! Worked perfectly.
terjeph
Obsidian | Level 7

I actually do care about gaps between the dates so record 4 and 5 should not collaps. Any suggestions?

Tom
Super User Tom
Super User

How large of a GAP between START and the END of the previous interval to you want to accept without making a new interval? 

 

If you do not care about the size of the gap you can use the NOTSORTED option of the BY statement to assign a group id to data that is grouped, but not actually sorted.

data have;
  input ID (Start End) (:yymmdd.) Hours;
  format start end yymmdd10.;
cards;
1  20230101 20230108 8
1  20230109 20230131 8
1  20230201 20230228 4
2  20230101 20230108 8
2  20230110 20230131 8
2  20230201 20230228 4
;

data grouped;
  set have;
  by id hours notsorted;
  if first.id then group=0;
  group+first.hours;
run;

proc summary data=grouped;
  by id group hours;
  output out=want(drop=_type_) min(start)= max(end)= ;
run;

Results

Obs    ID    group    Hours    _FREQ_         Start           End

 1      1      1        8         2      2023-01-01    2023-01-31
 2      1      2        4         1      2023-02-01    2023-02-28
 3      2      1        8         2      2023-01-01    2023-01-31
 4      2      2        4         1      2023-02-01    2023-02-28

terjeph
Obsidian | Level 7

This was an important comment. In fact I'm interested in situastions were there are gaps between the periods, like in row 4 and 5. How is this implemented in the code you suggested?

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2064 views
  • 0 likes
  • 4 in conversation