BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
inid
Obsidian | Level 7
I have a large dataset where for each unique ID I have different start dates and end dates of a given event.
Initial data looks like below:
ID_number Start_Date End_Date

XXX 30APR2019 31OCT2019

XXX 31AUG2018 28SEP2018

XXX 28FEB2018 30MAR2018

XXX 30NOV2017 29DEC2017

XXX 28APR2017 31MAY2017

XXX 31OCT2016 30NOV2016

XXX 30JUN2016 29JUL2016

XXX 31DEC2013 29AUG2014

XXX 30SEP2013 27NOV2013

XXX 31JAN2013 27FEB2013

XXX 31MAY2012 31JUL2012

XXX 26APR2011 30SEP2011
I want to achieve the below end results:
ID_number Start_Date End_Date New_Start_date New_End_Date

XXX 30APR2019 31OCT2019 30JUN2016 31OCT2019

XXX 31AUG2018 28SEP2018 30JUN2016 31OCT2019

XXX 28FEB2018 30MAR2018 30JUN2016 31OCT2019

XXX 30NOV2017 29DEC2017 30JUN2016 31OCT2019

XXX 28APR2017 31MAY2017 30JUN2016 31OCT2019

XXX 31OCT2016 30NOV2016 30JUN2016 31OCT2019

XXX 30JUN2016 29JUL2016 30JUN2016 31OCT2019

XXX 31DEC2013 29AUG2014 26APR2011 29AUG2014

XXX 30SEP2013 27NOV2013 26APR2011 29AUG2014

XXX 31JAN2013 27FEB2013 26APR2011 29AUG2014

XXX 31MAY2012 31JUL2012 26APR2011 29AUG2014

XXX 26APR2011 30SEP2011 26APR2011 29AUG2014

The logic of the "New_start_date" and "New_End_Date" variables is as below:
If less than one year has passed between Start_date and previous End_date, then New_start_date should be equal to the minimum of Start_date for the given ID and New_End_Date should be equal to the maximum of End_Date for the given ID
Proc sql can be used as well

Any help would be highly appreciated. Thank you in advance. Proc sql can be used as well
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Brute force attack, using two data steps to merge the ends back:

data have;
input id $ start_date :date9. end_date :date9.;
format start_date end_date e8601da10.;
datalines;
XXX 30APR2019 31OCT2019
XXX 31AUG2018 28SEP2018
XXX 28FEB2018 30MAR2018
XXX 30NOV2017 29DEC2017
XXX 28APR2017 31MAY2017
XXX 31OCT2016 30NOV2016
XXX 30JUN2016 29JUL2016
XXX 31DEC2013 29AUG2014
XXX 30SEP2013 27NOV2013
XXX 31JAN2013 27FEB2013
XXX 31MAY2012 31JUL2012
XXX 26APR2011 30SEP2011
;

proc sort data=have;
by id start_date;
run;

data starts (keep=id group start_date end_date new_start_date) ends (keep=id group new_end_date);
set have;
by id;
format new_start_date new_end_date e8601da10.;
retain new_start_date group;
l_end = lag(end_date);
if first.id
then do;
  new_start_date = start_date;
  group = 1;
end;
else do;
  if intck('year',l_end,start_date,'c') ge 1
  then do;
    new_end_date = l_end;
    output ends;
    new_start_date = start_date;
    group + 1;
  end;
end;
output starts;
if last.id
then do;
  new_end_date = end_date;
  output ends;
end;
run;

data want;
merge
  starts
  ends
;
by id group;
drop group;
run;

proc sort data=want;
by id descending start_date;
run;

Note how a data step with datalines makes it very easy for anybody to recreate the data I used. Please present example data in this way, so we do not have to make guesses about your data structure (types, lengths, formats).

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

Brute force attack, using two data steps to merge the ends back:

data have;
input id $ start_date :date9. end_date :date9.;
format start_date end_date e8601da10.;
datalines;
XXX 30APR2019 31OCT2019
XXX 31AUG2018 28SEP2018
XXX 28FEB2018 30MAR2018
XXX 30NOV2017 29DEC2017
XXX 28APR2017 31MAY2017
XXX 31OCT2016 30NOV2016
XXX 30JUN2016 29JUL2016
XXX 31DEC2013 29AUG2014
XXX 30SEP2013 27NOV2013
XXX 31JAN2013 27FEB2013
XXX 31MAY2012 31JUL2012
XXX 26APR2011 30SEP2011
;

proc sort data=have;
by id start_date;
run;

data starts (keep=id group start_date end_date new_start_date) ends (keep=id group new_end_date);
set have;
by id;
format new_start_date new_end_date e8601da10.;
retain new_start_date group;
l_end = lag(end_date);
if first.id
then do;
  new_start_date = start_date;
  group = 1;
end;
else do;
  if intck('year',l_end,start_date,'c') ge 1
  then do;
    new_end_date = l_end;
    output ends;
    new_start_date = start_date;
    group + 1;
  end;
end;
output starts;
if last.id
then do;
  new_end_date = end_date;
  output ends;
end;
run;

data want;
merge
  starts
  ends
;
by id group;
drop group;
run;

proc sort data=want;
by id descending start_date;
run;

Note how a data step with datalines makes it very easy for anybody to recreate the data I used. Please present example data in this way, so we do not have to make guesses about your data structure (types, lengths, formats).

inid
Obsidian | Level 7
Dear KurtBremser,
Thank you very much for your help. I apologize for any inconvenience.

Kurt_Bremser
Super User

No need to aplogize, you're new here. Just make it easier for us to help you.

 

Using a data step with datalines to create data on the spot is a very useful SAS skill, and teaches the foundation of reading text sources.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 882 views
  • 1 like
  • 2 in conversation