SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
thelowendHz
Fluorite | Level 6

Here is a small sample of a data set that looks like this: 

DATA HAVE;
INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
format START_DT MMDDYY10. END_DT MMDDYY10.;
datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/3999
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/3999
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/3999
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/3999
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/3999
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/3999
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/3999
;
RUN;

For each unique ID I want to collapse continuous date spans whenever possible. For example, for the first ID 1004064609 I would want a single record where START_DT = 04/01/2023 AND END_DT = 12/31/3999. Note that some IDs will have multiple date spans that aren't continuous and those should continue to exist as separate records. What's a solution to do this? Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

I changed your infinity date to 2050 from 3999 (too far).    This method expands the ranges and then looks for gaps GT 1.

 

DATA HAVE;
   INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
   format START_DT MMDDYY10. END_DT MMDDYY10.;
   datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/2050
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/2050
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/2050
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/2050
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/2050
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/2050
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/2050
;
   RUN;

data expand / view=expand;
   set have;
   do date=start_dt to end_dt;
      output;
      end;
   keep id date;
   format date yymmdd10.;
   run;

proc summary data=expand nway;
   by id;
   class date;
   output out=unique(drop=_:);
   run;
data group / view=group;
   set unique;
   by id;
   if first.id then group=0;
   if dif(date) ne 1 then group+1;
   run;
proc summary data=group nway missing;
   by id;
   class group;
   output out=range(drop=_:) min(date)=start max(date)=end range(date)=Range;
   format range: ; 
   run;

Capture.PNG

View solution in original post

3 REPLIES 3
data_null__
Jade | Level 19

I changed your infinity date to 2050 from 3999 (too far).    This method expands the ranges and then looks for gaps GT 1.

 

DATA HAVE;
   INPUT ID $10. START_DT :MMDDYY10. END_DT :MMDDYY10.;
   format START_DT MMDDYY10. END_DT MMDDYY10.;
   datalines;
1004064609 04/01/2023 12/31/2023
1004064609 01/01/2024 12/31/2050
1004064623 04/01/2023 12/31/2023
1004064623 01/01/2024 12/31/2050
1004064703 03/06/2024 03/28/2024
1004064703 03/29/2024 12/31/2050
1004064706 04/01/2023 12/31/2023
1004064706 01/01/2024 12/31/2050
1004064724 04/01/2023 06/30/2023
1004064724 07/01/2023 11/30/2023
1004064724 12/01/2023 12/31/2023
1004064724 01/01/2024 12/31/2050
1004064726 04/01/2023 10/31/2023
1004064726 12/05/2023 12/31/2023
1004064726 01/01/2024 05/31/2024
1004064726 06/01/2024 12/31/2050
1004064727 04/01/2023 12/31/2023
1004064727 01/01/2024 12/31/2050
;
   RUN;

data expand / view=expand;
   set have;
   do date=start_dt to end_dt;
      output;
      end;
   keep id date;
   format date yymmdd10.;
   run;

proc summary data=expand nway;
   by id;
   class date;
   output out=unique(drop=_:);
   run;
data group / view=group;
   set unique;
   by id;
   if first.id then group=0;
   if dif(date) ne 1 then group+1;
   run;
proc summary data=group nway missing;
   by id;
   class group;
   output out=range(drop=_:) min(date)=start max(date)=end range(date)=Range;
   format range: ; 
   run;

Capture.PNG

ballardw
Super User

If you don't have overlapping intervals then perhaps:

 

/* assumes Have is sorted by ID and Start_dt*/
data temp;
   set have;
   by id ;
   retain tstart;
   lend = lag(end_dt);
   if first.id then tstart=start_dt;
   else if intck('day',lend,start_dt)=1 then start_dt=tstart;
   else tstart=start_dt;
   drop tstart lend;
run;
data want;
  set temp;
  by id start_dt;
  if last.start_dt;
run;
JOL
SAS Employee JOL
SAS Employee

This solution uses PROC SORT and Group processing in the DATA STEP.

 

proc sort data=have out=have2;
by id start_dt end_dt;
run;

 

data have2;
set have2;
retain id2 start_dt2;
by id start_dt end_dt;

if first.id then
do;
id2=id;
start_dt2=start_dt;
end;

if last.id then
end_dt2=end_dt;

if last.id;
format start_dt2 end_dt2 mmddyy10.;
keep id2 start_dt2 end_dt2;
run;

 

JOL_0-1718137557230.png

 

 

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 765 views
  • 6 likes
  • 4 in conversation