BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sneha-25
Calcite | Level 5
I have a dataset like this:
ID Start date End date
A 2015-01-01 2015-12-31
A 2016-01-01 2016-12-31
A 2017-01-03. 2017-12-31
B 2020-01-01 2020-12-31
B. 2022-01-01 2020-12-31

I want my final dataset to be
ID Start date End date
A 2015-01-01 2016-12-31
B 2020-01-01 2020-12-31

It should be the earliest begin date and latest end date from the continuous coverage. If there is a break in the next begin date then it should not consider the rows after that for that particular ID.
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You could just use normal code for collapsing to continuous periods.

So perhaps something like this:

data have;
  input ID $ (start end) (:yymmdd.);
  format start end yymmdd10.;
cards;
A 2015-01-01 2015-12-31
A 2016-01-01 2016-12-31
A 2017-01-03 2017-12-31
B 2020-01-01 2020-12-31
B 2022-01-01 2022-12-31
;

data want;
 set have ;
 by id start end;
 if first.id then do;
    period=0;
    lagend = start-2;
 end;
 retain period lagend min max;
 format lagend min max yymmdd10.;
 if 1<(start-lagend) then do;
   if period then output;
   period+1;
   min=start;
   max=end;
 end;
 max=max(max,end);
 if last.id then output;
 lagend=end;
 drop start end lagend;
run;

Result

OBS    ID    period           min           max

 1     A        1      2015-01-01    2016-12-31
 2     A        2      2017-01-03    2017-12-31
 3     B        1      2020-01-01    2020-12-31
 4     B        2      2022-01-01    2022-12-31

And just modify it to only OUTPUT when PERIOD is 1.

data want;
 set have ;
 by id start end;
 if first.id then do;
    period=0;
    lagend = start-2;
 end;
 retain period lagend min max;
 format lagend min max yymmdd10.;
 if 1<(start-lagend) then do;
   if period=1 then output;
   period+1;
   min=start;
   max=end;
 end;
 max=max(max,end);
 if last.id and period=1 then output;
 lagend=end;
 drop start end lagend;
run;

Result

OBS    ID    period           min           max

 1     A        1      2015-01-01    2016-12-31
 2     B        1      2020-01-01    2020-12-31

 

 

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

What if you have more than one obs at earliest start or latest end?

 

And also, your last obs has start of 2022-01-01 and end of 2020-12-31.  Doesn't make sense.  In fact, that end is an actual tie between the last two obs.

 

Finally, is your data sorted by start?  Your sample is.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

You could just use normal code for collapsing to continuous periods.

So perhaps something like this:

data have;
  input ID $ (start end) (:yymmdd.);
  format start end yymmdd10.;
cards;
A 2015-01-01 2015-12-31
A 2016-01-01 2016-12-31
A 2017-01-03 2017-12-31
B 2020-01-01 2020-12-31
B 2022-01-01 2022-12-31
;

data want;
 set have ;
 by id start end;
 if first.id then do;
    period=0;
    lagend = start-2;
 end;
 retain period lagend min max;
 format lagend min max yymmdd10.;
 if 1<(start-lagend) then do;
   if period then output;
   period+1;
   min=start;
   max=end;
 end;
 max=max(max,end);
 if last.id then output;
 lagend=end;
 drop start end lagend;
run;

Result

OBS    ID    period           min           max

 1     A        1      2015-01-01    2016-12-31
 2     A        2      2017-01-03    2017-12-31
 3     B        1      2020-01-01    2020-12-31
 4     B        2      2022-01-01    2022-12-31

And just modify it to only OUTPUT when PERIOD is 1.

data want;
 set have ;
 by id start end;
 if first.id then do;
    period=0;
    lagend = start-2;
 end;
 retain period lagend min max;
 format lagend min max yymmdd10.;
 if 1<(start-lagend) then do;
   if period=1 then output;
   period+1;
   min=start;
   max=end;
 end;
 max=max(max,end);
 if last.id and period=1 then output;
 lagend=end;
 drop start end lagend;
run;

Result

OBS    ID    period           min           max

 1     A        1      2015-01-01    2016-12-31
 2     B        1      2020-01-01    2020-12-31

 

 

Ksharp
Super User
data have;
input ID $ Startdate :yymmdd12. Enddate :yymmdd12.;
format Startdate  Enddate :yymmdd10.;
cards;
A 2015-01-01 2015-12-31
A 2016-01-01 2016-12-31
A 2017-01-03 2017-12-31
B 2020-01-01 2020-12-31
B 2022-01-01 2020-12-31
;

data temp;
 set have;
 do date=Startdate to  Enddate;
   output;
 end;
format date yymmdd10.;
keep id date;
run;
proc sort data=temp out=temp2 nodupkey;
by id date;
run;
data temp3;
 set temp2;
 by id;
 if first.id then group=0;
 if first.id or dif(date) ne 1 then group+1;

 if group=1;
run;
proc summary data=temp3;
by id;
var date;
output out=want(drop=_:) min=Startdate max=Enddate;
run;

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 719 views
  • 0 likes
  • 4 in conversation