BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sneha-25
Calcite | Level 5
I have a dataset like this:
ID Start date End date
A 2015-01-01 2015-12-31
A 2016-01-01 2016-12-31
A 2017-01-03. 2017-12-31
B 2020-01-01 2020-12-31
B. 2022-01-01 2020-12-31

I want my final dataset to be
ID Start date End date
A 2015-01-01 2016-12-31
B 2020-01-01 2020-12-31

It should be the earliest begin date and latest end date from the continuous coverage. If there is a break in the next begin date then it should not consider the rows after that for that particular ID.
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You could just use normal code for collapsing to continuous periods.

So perhaps something like this:

data have;
  input ID $ (start end) (:yymmdd.);
  format start end yymmdd10.;
cards;
A 2015-01-01 2015-12-31
A 2016-01-01 2016-12-31
A 2017-01-03 2017-12-31
B 2020-01-01 2020-12-31
B 2022-01-01 2022-12-31
;

data want;
 set have ;
 by id start end;
 if first.id then do;
    period=0;
    lagend = start-2;
 end;
 retain period lagend min max;
 format lagend min max yymmdd10.;
 if 1<(start-lagend) then do;
   if period then output;
   period+1;
   min=start;
   max=end;
 end;
 max=max(max,end);
 if last.id then output;
 lagend=end;
 drop start end lagend;
run;

Result

OBS    ID    period           min           max

 1     A        1      2015-01-01    2016-12-31
 2     A        2      2017-01-03    2017-12-31
 3     B        1      2020-01-01    2020-12-31
 4     B        2      2022-01-01    2022-12-31

And just modify it to only OUTPUT when PERIOD is 1.

data want;
 set have ;
 by id start end;
 if first.id then do;
    period=0;
    lagend = start-2;
 end;
 retain period lagend min max;
 format lagend min max yymmdd10.;
 if 1<(start-lagend) then do;
   if period=1 then output;
   period+1;
   min=start;
   max=end;
 end;
 max=max(max,end);
 if last.id and period=1 then output;
 lagend=end;
 drop start end lagend;
run;

Result

OBS    ID    period           min           max

 1     A        1      2015-01-01    2016-12-31
 2     B        1      2020-01-01    2020-12-31

 

 

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

What if you have more than one obs at earliest start or latest end?

 

And also, your last obs has start of 2022-01-01 and end of 2020-12-31.  Doesn't make sense.  In fact, that end is an actual tie between the last two obs.

 

Finally, is your data sorted by start?  Your sample is.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

You could just use normal code for collapsing to continuous periods.

So perhaps something like this:

data have;
  input ID $ (start end) (:yymmdd.);
  format start end yymmdd10.;
cards;
A 2015-01-01 2015-12-31
A 2016-01-01 2016-12-31
A 2017-01-03 2017-12-31
B 2020-01-01 2020-12-31
B 2022-01-01 2022-12-31
;

data want;
 set have ;
 by id start end;
 if first.id then do;
    period=0;
    lagend = start-2;
 end;
 retain period lagend min max;
 format lagend min max yymmdd10.;
 if 1<(start-lagend) then do;
   if period then output;
   period+1;
   min=start;
   max=end;
 end;
 max=max(max,end);
 if last.id then output;
 lagend=end;
 drop start end lagend;
run;

Result

OBS    ID    period           min           max

 1     A        1      2015-01-01    2016-12-31
 2     A        2      2017-01-03    2017-12-31
 3     B        1      2020-01-01    2020-12-31
 4     B        2      2022-01-01    2022-12-31

And just modify it to only OUTPUT when PERIOD is 1.

data want;
 set have ;
 by id start end;
 if first.id then do;
    period=0;
    lagend = start-2;
 end;
 retain period lagend min max;
 format lagend min max yymmdd10.;
 if 1<(start-lagend) then do;
   if period=1 then output;
   period+1;
   min=start;
   max=end;
 end;
 max=max(max,end);
 if last.id and period=1 then output;
 lagend=end;
 drop start end lagend;
run;

Result

OBS    ID    period           min           max

 1     A        1      2015-01-01    2016-12-31
 2     B        1      2020-01-01    2020-12-31

 

 

Ksharp
Super User
data have;
input ID $ Startdate :yymmdd12. Enddate :yymmdd12.;
format Startdate  Enddate :yymmdd10.;
cards;
A 2015-01-01 2015-12-31
A 2016-01-01 2016-12-31
A 2017-01-03 2017-12-31
B 2020-01-01 2020-12-31
B 2022-01-01 2020-12-31
;

data temp;
 set have;
 do date=Startdate to  Enddate;
   output;
 end;
format date yymmdd10.;
keep id date;
run;
proc sort data=temp out=temp2 nodupkey;
by id date;
run;
data temp3;
 set temp2;
 by id;
 if first.id then group=0;
 if first.id or dif(date) ne 1 then group+1;

 if group=1;
run;
proc summary data=temp3;
by id;
var date;
output out=want(drop=_:) min=Startdate max=Enddate;
run;

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 684 views
  • 0 likes
  • 4 in conversation