BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
thelowendHz
Fluorite | Level 6

Here is a small sample of a data set that looks like this: 

DATA HAVE;
INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
format START_DT MMDDYY10. END_DT MMDDYY10.;
datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/3999
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/3999
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/3999
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/3999
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/3999
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/3999
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/3999
;
RUN;

For each unique ID I want to collapse continuous date spans whenever possible. For example, for the first ID 1004064609 I would want a single record where START_DT = 04/01/2023 AND END_DT = 12/31/3999. Note that some IDs will have multiple date spans that aren't continuous and those should continue to exist as separate records. What's a solution to do this? Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

I changed your infinity date to 2050 from 3999 (too far).    This method expands the ranges and then looks for gaps GT 1.

 

DATA HAVE;
   INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
   format START_DT MMDDYY10. END_DT MMDDYY10.;
   datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/2050
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/2050
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/2050
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/2050
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/2050
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/2050
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/2050
;
   RUN;

data expand / view=expand;
   set have;
   do date=start_dt to end_dt;
      output;
      end;
   keep id date;
   format date yymmdd10.;
   run;

proc summary data=expand nway;
   by id;
   class date;
   output out=unique(drop=_:);
   run;
data group / view=group;
   set unique;
   by id;
   if first.id then group=0;
   if dif(date) ne 1 then group+1;
   run;
proc summary data=group nway missing;
   by id;
   class group;
   output out=range(drop=_:) min(date)=start max(date)=end range(date)=Range;
   format range: ; 
   run;

Capture.PNG

View solution in original post

3 REPLIES 3
data_null__
Jade | Level 19

I changed your infinity date to 2050 from 3999 (too far).    This method expands the ranges and then looks for gaps GT 1.

 

DATA HAVE;
   INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
   format START_DT MMDDYY10. END_DT MMDDYY10.;
   datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/2050
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/2050
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/2050
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/2050
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/2050
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/2050
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/2050
;
   RUN;

data expand / view=expand;
   set have;
   do date=start_dt to end_dt;
      output;
      end;
   keep id date;
   format date yymmdd10.;
   run;

proc summary data=expand nway;
   by id;
   class date;
   output out=unique(drop=_:);
   run;
data group / view=group;
   set unique;
   by id;
   if first.id then group=0;
   if dif(date) ne 1 then group+1;
   run;
proc summary data=group nway missing;
   by id;
   class group;
   output out=range(drop=_:) min(date)=start max(date)=end range(date)=Range;
   format range: ; 
   run;

Capture.PNG

ballardw
Super User

If you don't have overlapping intervals then perhaps:

 

/* assumes Have is sorted by ID and Start_dt*/
data temp;
   set have;
   by id ;
   retain tstart;
   lend = lag(end_dt);
   if first.id then tstart=start_dt;
   else if intck('day',lend,start_dt)=1 then start_dt=tstart;
   else tstart=start_dt;
   drop tstart lend;
run;
data want;
  set temp;
  by id start_dt;
  if last.start_dt;
run;
JOL
SAS Employee JOL
SAS Employee

This solution uses PROC SORT and Group processing in the DATA STEP.

 

proc sort data=have out=have2;
by id start_dt end_dt;
run;

 

data have2;
set have2;
retain id2 start_dt2;
by id start_dt end_dt;

if first.id then
do;
id2=id;
start_dt2=start_dt;
end;

if last.id then
end_dt2=end_dt;

if last.id;
format start_dt2 end_dt2 mmddyy10.;
keep id2 start_dt2 end_dt2;
run;

 

JOL_0-1718137557230.png

 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 241 views
  • 6 likes
  • 4 in conversation