BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
terjeph
Obsidian | Level 7

I want to combine two or more time periods into one, contingent upon a) same ID and b) same value on another variable (hours).

 

Data have:

ID                           Start                      End                        Hours

1                             20230101             20230108             8

1                             20230109             20230131             8

1                             20230201             20230228             4

2                             20230101             20230108             8

2                             20230110             20230131             8

2                             20230201             20230228             4

 

Data want:

ID                           Start                      End                        Hours

1                             20230101             20230131             8

1                             20230201             20230228             4

2                             20230101             20230108             8

2                             20230110             20230131             8

2                             20230201             20230228             4

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

PROC SUMMARY (alias PROC MEANS) is intended for exactly this type of result:

 

proc summary data=have noprint nway;
  output out=want (drop=_type_ _freq_)  min(start)=start max(end)=end;
  class id hours;
run;

However, the output data will be sorted by ID/HOURS, not ID/START.

 

 

Edited addition: Please note, as per @Tom 's comment, that this program ignores gaps between individual date ranges having matching ID/HOURS.  So the resulting min start and max end may encompass numerous widely separated individual date ranges.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
SASKiwi
PROC Star

Try this:

data Have;
input @1 ID $2.  @4 Start yymmdd8.   @14 End yymmdd8. @23 Hours 4.;
format start end yymmddn8.;
datalines;
1  20230101  20230108  8
1  20230109  20230131  8
1  20230201  20230228  4
2  20230101  20230108  8
2  20230110  20230131  8
2  20230201  20230228  4
;
run;

proc sql;
  create table Want as
  select  ID
         ,Hours
		 ,min(start) as start format = yymmddn8.
		 ,max(end) as end format = yymmddn8.
  from Have
  group by id, Hours
  order by id, start, end 
  ;
quit;

Note, my result also collapses ID 2 rows with 8 hours.

mkeintz
PROC Star

PROC SUMMARY (alias PROC MEANS) is intended for exactly this type of result:

 

proc summary data=have noprint nway;
  output out=want (drop=_type_ _freq_)  min(start)=start max(end)=end;
  class id hours;
run;

However, the output data will be sorted by ID/HOURS, not ID/START.

 

 

Edited addition: Please note, as per @Tom 's comment, that this program ignores gaps between individual date ranges having matching ID/HOURS.  So the resulting min start and max end may encompass numerous widely separated individual date ranges.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
terjeph
Obsidian | Level 7
Thanks! Worked perfectly.
terjeph
Obsidian | Level 7

I actually do care about gaps between the dates so record 4 and 5 should not collaps. Any suggestions?

Tom
Super User Tom
Super User

How large of a GAP between START and the END of the previous interval to you want to accept without making a new interval? 

 

If you do not care about the size of the gap you can use the NOTSORTED option of the BY statement to assign a group id to data that is grouped, but not actually sorted.

data have;
  input ID (Start End) (:yymmdd.) Hours;
  format start end yymmdd10.;
cards;
1  20230101 20230108 8
1  20230109 20230131 8
1  20230201 20230228 4
2  20230101 20230108 8
2  20230110 20230131 8
2  20230201 20230228 4
;

data grouped;
  set have;
  by id hours notsorted;
  if first.id then group=0;
  group+first.hours;
run;

proc summary data=grouped;
  by id group hours;
  output out=want(drop=_type_) min(start)= max(end)= ;
run;

Results

Obs    ID    group    Hours    _FREQ_         Start           End

 1      1      1        8         2      2023-01-01    2023-01-31
 2      1      2        4         1      2023-02-01    2023-02-28
 3      2      1        8         2      2023-01-01    2023-01-31
 4      2      2        4         1      2023-02-01    2023-02-28

terjeph
Obsidian | Level 7

This was an important comment. In fact I'm interested in situastions were there are gaps between the periods, like in row 4 and 5. How is this implemented in the code you suggested?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1644 views
  • 0 likes
  • 4 in conversation