BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KermitTheFrog
Fluorite | Level 6

I need to consolidate observations based on consecutive validity ranges.

Here is a reproducible minimal example:

data have;
infile datalines4 delimiter=",";
input id $ start :date9. end :date9. var1 var2 var3;
format start end date9.;
datalines4;
ABC1,01DEC2020,21FEB2021,1,0,1
ABC1,22FEB2021,10MAR2021,1,0,1
ABC1,11MAR2021,31DEC9999,1,1,1
DEF2,11DEC2020,10JAN2021,0,0,0
DEF2,11JAN2021,19JAN2021,1,1,1
DEF2,20JAN2021,02FEB2021,1,1,1
DEF2,03FEB2021,10MAR2025,1,0,1
;;;;
quit;

For each id, if the validity dates are consecutive (i.e. end_date + 1day of the n-1 observation is equal to the start_date of the n observation) and if they share the same value for all three variables (var1, var2 and var3), I want to consolidate those multiple observations into a single one.

What I would like to have:

 

KermitTheFrog_0-1619185644692.png

 

Any kind of approach (array/hash/sql) would be appreciated !

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data HAVE;
  infile datalines4 delimiter=",";
  input ID $ START :date9. END :date9. VAR1 VAR2 VAR3;
  format START END date9.;
datalines4;
ABC1,01DEC2020,21FEB2021,1,0,1
ABC1,22FEB2021,10MAR2021,1,0,1
ABC1,11MAR2021,31DEC9999,1,1,1
DEF2,11DEC2020,10JAN2021,0,0,0
DEF2,11JAN2021,19JAN2021,1,1,1
DEF2,20JAN2021,01FEB2021,1,1,1
DEF2,02FEB2021,02FEB2021,1,1,1
DEF2,03FEB2021,10MAR2025,1,0,1
;;;;
run;

data temp;
 set have;
 do date=start to end;
  output;
 end;
 drop start end;
 format date date9.;
run;
proc sort data=temp out=temp1 nodupkey;
by id VAR1 VAR2 VAR3 date;
run;
data temp2;
 set temp1;
 by id VAR1 VAR2 VAR3;
 if first.var3 or dif(date) ne 1 then group+1;
run;
proc summary data=temp2;
by  id VAR1 VAR2 VAR3;
var date;
output out=want min=start max=end;
run;

View solution in original post

2 REPLIES 2
ChrisNZ
Tourmaline | Level 20

Like this?

data HAVE;
  infile datalines4 delimiter=",";
  input ID $ START :date9. END :date9. VAR1 VAR2 VAR3;
  format START END date9.;
datalines4;
ABC1,01DEC2020,21FEB2021,1,0,1
ABC1,22FEB2021,10MAR2021,1,0,1
ABC1,11MAR2021,31DEC9999,1,1,1
DEF2,11DEC2020,10JAN2021,0,0,0
DEF2,11JAN2021,19JAN2021,1,1,1
DEF2,20JAN2021,01FEB2021,1,1,1
DEF2,02FEB2021,02FEB2021,1,1,1
DEF2,03FEB2021,10MAR2025,1,0,1
run;

data WANT;
  set HAVE end=LASTOBS;
  if ^LASTOBS then set HAVE 
    (firstobs=2
     rename=(ID=_ID START=_START END=_END VAR1=_VAR1 VAR2=_VAR2 VAR3=_VAR3)) ;
  _SAMEVALUES= ( ID=_ID & VAR1=_VAR1 & VAR2=_VAR2 & VAR3=_VAR3);
  if LASTOBS | ^_SAMEVALUES then do;
    START=coalesce(_S,START);
    output;
    _S=.;
  end;
  else if ^_S then _S=START;   
  retain _S;   
  drop  _:;
run;

ID START END VAR1 VAR2 VAR3
ABC1 01DEC2020 10MAR2021 1 0 1
ABC1 11MAR2021 31DEC9999 1 1 1
DEF2 11DEC2020 10JAN2021 0 0 0
DEF2 11JAN2021 02FEB2021 1 1 1
DEF2 03FEB2021 10MAR2025 1 0 1

 

Ksharp
Super User
data HAVE;
  infile datalines4 delimiter=",";
  input ID $ START :date9. END :date9. VAR1 VAR2 VAR3;
  format START END date9.;
datalines4;
ABC1,01DEC2020,21FEB2021,1,0,1
ABC1,22FEB2021,10MAR2021,1,0,1
ABC1,11MAR2021,31DEC9999,1,1,1
DEF2,11DEC2020,10JAN2021,0,0,0
DEF2,11JAN2021,19JAN2021,1,1,1
DEF2,20JAN2021,01FEB2021,1,1,1
DEF2,02FEB2021,02FEB2021,1,1,1
DEF2,03FEB2021,10MAR2025,1,0,1
;;;;
run;

data temp;
 set have;
 do date=start to end;
  output;
 end;
 drop start end;
 format date date9.;
run;
proc sort data=temp out=temp1 nodupkey;
by id VAR1 VAR2 VAR3 date;
run;
data temp2;
 set temp1;
 by id VAR1 VAR2 VAR3;
 if first.var3 or dif(date) ne 1 then group+1;
run;
proc summary data=temp2;
by  id VAR1 VAR2 VAR3;
var date;
output out=want min=start max=end;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 486 views
  • 0 likes
  • 3 in conversation