🔒 This topic is solved and locked.
Need further help from the community? Please
sign in and ask a new question.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 04-23-2021 09:47 AM
(550 views)
I need to consolidate observations based on consecutive validity ranges.
Here is a reproducible minimal example:
data have;
infile datalines4 delimiter=",";
input id $ start :date9. end :date9. var1 var2 var3;
format start end date9.;
datalines4;
ABC1,01DEC2020,21FEB2021,1,0,1
ABC1,22FEB2021,10MAR2021,1,0,1
ABC1,11MAR2021,31DEC9999,1,1,1
DEF2,11DEC2020,10JAN2021,0,0,0
DEF2,11JAN2021,19JAN2021,1,1,1
DEF2,20JAN2021,02FEB2021,1,1,1
DEF2,03FEB2021,10MAR2025,1,0,1
;;;;
quit;
For each id, if the validity dates are consecutive (i.e. end_date + 1day
of the n-1
observation is equal to the start_date
of the n
observation) and if they share the same value for all three variables (var1, var2 and var3), I want to consolidate those multiple observations into a single one.
What I would like to have:
Any kind of approach (array/hash/sql) would be appreciated !
1 ACCEPTED SOLUTION
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data HAVE;
infile datalines4 delimiter=",";
input ID $ START :date9. END :date9. VAR1 VAR2 VAR3;
format START END date9.;
datalines4;
ABC1,01DEC2020,21FEB2021,1,0,1
ABC1,22FEB2021,10MAR2021,1,0,1
ABC1,11MAR2021,31DEC9999,1,1,1
DEF2,11DEC2020,10JAN2021,0,0,0
DEF2,11JAN2021,19JAN2021,1,1,1
DEF2,20JAN2021,01FEB2021,1,1,1
DEF2,02FEB2021,02FEB2021,1,1,1
DEF2,03FEB2021,10MAR2025,1,0,1
;;;;
run;
data temp;
set have;
do date=start to end;
output;
end;
drop start end;
format date date9.;
run;
proc sort data=temp out=temp1 nodupkey;
by id VAR1 VAR2 VAR3 date;
run;
data temp2;
set temp1;
by id VAR1 VAR2 VAR3;
if first.var3 or dif(date) ne 1 then group+1;
run;
proc summary data=temp2;
by id VAR1 VAR2 VAR3;
var date;
output out=want min=start max=end;
run;
2 REPLIES 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like this?
data HAVE;
infile datalines4 delimiter=",";
input ID $ START :date9. END :date9. VAR1 VAR2 VAR3;
format START END date9.;
datalines4;
ABC1,01DEC2020,21FEB2021,1,0,1
ABC1,22FEB2021,10MAR2021,1,0,1
ABC1,11MAR2021,31DEC9999,1,1,1
DEF2,11DEC2020,10JAN2021,0,0,0
DEF2,11JAN2021,19JAN2021,1,1,1
DEF2,20JAN2021,01FEB2021,1,1,1
DEF2,02FEB2021,02FEB2021,1,1,1
DEF2,03FEB2021,10MAR2025,1,0,1
run;
data WANT;
set HAVE end=LASTOBS;
if ^LASTOBS then set HAVE
(firstobs=2
rename=(ID=_ID START=_START END=_END VAR1=_VAR1 VAR2=_VAR2 VAR3=_VAR3)) ;
_SAMEVALUES= ( ID=_ID & VAR1=_VAR1 & VAR2=_VAR2 & VAR3=_VAR3);
if LASTOBS | ^_SAMEVALUES then do;
START=coalesce(_S,START);
output;
_S=.;
end;
else if ^_S then _S=START;
retain _S;
drop _:;
run;
ID | START | END | VAR1 | VAR2 | VAR3 |
---|---|---|---|---|---|
ABC1 | 01DEC2020 | 10MAR2021 | 1 | 0 | 1 |
ABC1 | 11MAR2021 | 31DEC9999 | 1 | 1 | 1 |
DEF2 | 11DEC2020 | 10JAN2021 | 0 | 0 | 0 |
DEF2 | 11JAN2021 | 02FEB2021 | 1 | 1 | 1 |
DEF2 | 03FEB2021 | 10MAR2025 | 1 | 0 | 1 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data HAVE;
infile datalines4 delimiter=",";
input ID $ START :date9. END :date9. VAR1 VAR2 VAR3;
format START END date9.;
datalines4;
ABC1,01DEC2020,21FEB2021,1,0,1
ABC1,22FEB2021,10MAR2021,1,0,1
ABC1,11MAR2021,31DEC9999,1,1,1
DEF2,11DEC2020,10JAN2021,0,0,0
DEF2,11JAN2021,19JAN2021,1,1,1
DEF2,20JAN2021,01FEB2021,1,1,1
DEF2,02FEB2021,02FEB2021,1,1,1
DEF2,03FEB2021,10MAR2025,1,0,1
;;;;
run;
data temp;
set have;
do date=start to end;
output;
end;
drop start end;
format date date9.;
run;
proc sort data=temp out=temp1 nodupkey;
by id VAR1 VAR2 VAR3 date;
run;
data temp2;
set temp1;
by id VAR1 VAR2 VAR3;
if first.var3 or dif(date) ne 1 then group+1;
run;
proc summary data=temp2;
by id VAR1 VAR2 VAR3;
var date;
output out=want min=start max=end;
run;