Hello Users,
I am working on criteria set as below
> Find ID's which who have at least 3 rows - date in second row within 12 months of first row and date in third row within 6 months of second row.
Here is a sample data
#ID | Startdate | Enddate | Date_diff | Months |
143 | 3/6/2007 | 3/6/2007 | * | |
143 | 9/17/2007 | 9/17/2007 | 195 | 6.414474 |
143 | 10/16/2008 | 10/16/2008 | 395 | 12.99342 |
143 | 12/19/2009 | 12/19/2009 | 429 | 14.11184 |
155 | 11/8/2011 | 11/19/2011 | * | |
155 | 7/19/2012 | 7/21/2012 | 243 | 7.993421 |
155 | 5/31/2013 | 6/2/2013 | 314 | 10.32895 |
155 | 7/7/2013 | 7/8/2012 | 35 | 1.151316 |
155 | 4/10/2014 | 4/11/2014 | 641 | 21.08553 |
The Date_diff come from substracting lag(enddate) with Startdate.
Based on the criteria mentioned above. Only ID 155 should be in the output as it has 3 consecutive rows with a row date_diff < 12 months and a next consecutive row with date_diff < 6 months. This should be BY ID.
Any help is appreciated.
Thank you.
For this application sort by descending date so you are on the row you want to output when you have the information you need to make the decision.
For example:
data have ;
input ID $ Startdate Enddate ;
informat Startdate Enddate mmddyy.;
format Startdate Enddate yymmdd10.;
cards;
143 3/6/2007 3/6/2007
143 9/17/2007 9/17/2007
143 10/16/2008 10/16/2008
143 12/19/2009 12/19/2009
155 11/8/2011 11/19/2011
155 7/19/2012 7/21/2012
155 5/31/2013 6/2/2013
155 7/7/2013 7/8/2012
155 4/10/2014 4/11/2014
;
proc sort data=have ;
by id descending Startdate ;
run;
data want ;
set have ;
by id ;
dif1 = sum(-startdate,lag1(startdate));
dif2 = sum(-startdate,lag2(startdate));
if first.id then dif1=.;
if first.id or lag(first.id) then dif2=.;
flag= dif2>360 and dif1>180 ;
run;
proc print;
run;
Obs ID Startdate Enddate dif1 dif2 flag 1 143 2009-12-19 2009-12-19 . . 0 2 143 2008-10-16 2008-10-16 429 . 0 3 143 2007-09-17 2007-09-17 395 824 1 4 143 2007-03-06 2007-03-06 195 590 1 5 155 2014-04-10 2014-04-11 . . 0 6 155 2013-07-07 2012-07-08 277 . 0 7 155 2013-05-31 2013-06-02 37 314 0 8 155 2012-07-19 2012-07-21 316 353 0 9 155 2011-11-08 2011-11-19 254 570 1
For this application sort by descending date so you are on the row you want to output when you have the information you need to make the decision.
For example:
data have ;
input ID $ Startdate Enddate ;
informat Startdate Enddate mmddyy.;
format Startdate Enddate yymmdd10.;
cards;
143 3/6/2007 3/6/2007
143 9/17/2007 9/17/2007
143 10/16/2008 10/16/2008
143 12/19/2009 12/19/2009
155 11/8/2011 11/19/2011
155 7/19/2012 7/21/2012
155 5/31/2013 6/2/2013
155 7/7/2013 7/8/2012
155 4/10/2014 4/11/2014
;
proc sort data=have ;
by id descending Startdate ;
run;
data want ;
set have ;
by id ;
dif1 = sum(-startdate,lag1(startdate));
dif2 = sum(-startdate,lag2(startdate));
if first.id then dif1=.;
if first.id or lag(first.id) then dif2=.;
flag= dif2>360 and dif1>180 ;
run;
proc print;
run;
Obs ID Startdate Enddate dif1 dif2 flag 1 143 2009-12-19 2009-12-19 . . 0 2 143 2008-10-16 2008-10-16 429 . 0 3 143 2007-09-17 2007-09-17 395 824 1 4 143 2007-03-06 2007-03-06 195 590 1 5 155 2014-04-10 2014-04-11 . . 0 6 155 2013-07-07 2012-07-08 277 . 0 7 155 2013-05-31 2013-06-02 37 314 0 8 155 2012-07-19 2012-07-21 316 353 0 9 155 2011-11-08 2011-11-19 254 570 1
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.