BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
thelowendHz
Fluorite | Level 6

Here is a small sample of a data set that looks like this: 

DATA HAVE;
INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
format START_DT MMDDYY10. END_DT MMDDYY10.;
datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/3999
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/3999
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/3999
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/3999
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/3999
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/3999
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/3999
;
RUN;

For each unique ID I want to collapse continuous date spans whenever possible. For example, for the first ID 1004064609 I would want a single record where START_DT = 04/01/2023 AND END_DT = 12/31/3999. Note that some IDs will have multiple date spans that aren't continuous and those should continue to exist as separate records. What's a solution to do this? Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

I changed your infinity date to 2050 from 3999 (too far).    This method expands the ranges and then looks for gaps GT 1.

 

DATA HAVE;
   INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
   format START_DT MMDDYY10. END_DT MMDDYY10.;
   datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/2050
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/2050
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/2050
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/2050
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/2050
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/2050
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/2050
;
   RUN;

data expand / view=expand;
   set have;
   do date=start_dt to end_dt;
      output;
      end;
   keep id date;
   format date yymmdd10.;
   run;

proc summary data=expand nway;
   by id;
   class date;
   output out=unique(drop=_:);
   run;
data group / view=group;
   set unique;
   by id;
   if first.id then group=0;
   if dif(date) ne 1 then group+1;
   run;
proc summary data=group nway missing;
   by id;
   class group;
   output out=range(drop=_:) min(date)=start max(date)=end range(date)=Range;
   format range: ; 
   run;

Capture.PNG

View solution in original post

3 REPLIES 3
data_null__
Jade | Level 19

I changed your infinity date to 2050 from 3999 (too far).    This method expands the ranges and then looks for gaps GT 1.

 

DATA HAVE;
   INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
   format START_DT MMDDYY10. END_DT MMDDYY10.;
   datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/2050
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/2050
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/2050
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/2050
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/2050
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/2050
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/2050
;
   RUN;

data expand / view=expand;
   set have;
   do date=start_dt to end_dt;
      output;
      end;
   keep id date;
   format date yymmdd10.;
   run;

proc summary data=expand nway;
   by id;
   class date;
   output out=unique(drop=_:);
   run;
data group / view=group;
   set unique;
   by id;
   if first.id then group=0;
   if dif(date) ne 1 then group+1;
   run;
proc summary data=group nway missing;
   by id;
   class group;
   output out=range(drop=_:) min(date)=start max(date)=end range(date)=Range;
   format range: ; 
   run;

Capture.PNG

ballardw
Super User

If you don't have overlapping intervals then perhaps:

 

/* assumes Have is sorted by ID and Start_dt*/
data temp;
   set have;
   by id ;
   retain tstart;
   lend = lag(end_dt);
   if first.id then tstart=start_dt;
   else if intck('day',lend,start_dt)=1 then start_dt=tstart;
   else tstart=start_dt;
   drop tstart lend;
run;
data want;
  set temp;
  by id start_dt;
  if last.start_dt;
run;
JOL
SAS Employee JOL
SAS Employee

This solution uses PROC SORT and Group processing in the DATA STEP.

 

proc sort data=have out=have2;
by id start_dt end_dt;
run;

 

data have2;
set have2;
retain id2 start_dt2;
by id start_dt end_dt;

if first.id then
do;
id2=id;
start_dt2=start_dt;
end;

if last.id then
end_dt2=end_dt;

if last.id;
format start_dt2 end_dt2 mmddyy10.;
keep id2 start_dt2 end_dt2;
run;

 

JOL_0-1718137557230.png

 

 

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 2013 views
  • 6 likes
  • 4 in conversation