Hi SAS community,
I would appreciate your help in below problem:
I have a scenario in which a data-set sorted by ID and startdate and Enddate.
Each ID has multiple rows with startdate and enddate. If any 3 consecutive rows(Bunch of 3) have a date difference of <=60 months (1800 days) then Output.
For Ex: If the nth row of ID #1213 has an enddate- 01/12/2018 and nth-3 row has startdate- 31/12/2019 then that row should be flagged. Here is a sample dataset and expected output.
Row 3 startdate was compared to row 1 enddate and difference is 478 days. <1800 days or 60 months
Row 4 startdate was compared to row 2 enddate and difference is -30 days. <1800 days or 60 months
Row 5 startdate was compared to row 3 enddate and difference is 200 days. <1800 days or 60 months
Row 6 startdate was compared to row 4 enddate and difference is 1965 days. > 1800 days or 60 months
So this should not be in the output.
Basically every startdate should be matched with n-3 end date and flagged for output. I tried to use lag function but it seems to only look for n-1.
Any help is appreciated. Thank you for your time and effort.
ID | Startdate | Enddate | Diff N and N -3 |
1122 | 1/12/2017 | 1/30/2017 | |
1122 | 4/3/2018 | 7/22/2018 | |
1122 | 5/23/2018 | 6/2/2018 | 478 |
1122 | 6/22/2018 | 7/7/2018 | -30 |
1122 | 12/19/2018 | 12/31/2018 | 200 |
1122 | 11/23/2023 | 12/9/2023 | 1965 |
1122 | 3/22/2024 | 4/13/2024 | 1908 |
Output | |||
1122 | 43243 | 43253 | 478 |
1122 | 43273 | 43288 | -30 |
1122 | 43453 | 43465 | 200 |
Actually, based on your sample and desired output, you want n-2, which is lag2.
data have;
input ID cStartdate $ cEnddate $ Diff;
cards;
1122 01/12/17 01/30/17 0
1122 04/03/18 07/22/18 0
1122 05/23/18 06/02/18 478
1122 06/22/18 07/07/18 -30
1122 12/19/18 12/31/18 200
1122 11/23/23 12/09/23 1965
1122 03/22/24 04/13/24 1908
;
run;
data want (drop=cStartdate cEnddate dif2);
set have;
startdate=input(cstartdate,mmddyy12.);
enddate=input(cenddate,mmddyy12.);
dif2 = lag2(startdate)-enddate;
format startdate date9. enddate date9.;
if abs(dif2) <= 1800 and dif2 ne .;
run;
Try lag3.
Actually, based on your sample and desired output, you want n-2, which is lag2.
data have;
input ID cStartdate $ cEnddate $ Diff;
cards;
1122 01/12/17 01/30/17 0
1122 04/03/18 07/22/18 0
1122 05/23/18 06/02/18 478
1122 06/22/18 07/07/18 -30
1122 12/19/18 12/31/18 200
1122 11/23/23 12/09/23 1965
1122 03/22/24 04/13/24 1908
;
run;
data want (drop=cStartdate cEnddate dif2);
set have;
startdate=input(cstartdate,mmddyy12.);
enddate=input(cenddate,mmddyy12.);
dif2 = lag2(startdate)-enddate;
format startdate date9. enddate date9.;
if abs(dif2) <= 1800 and dif2 ne .;
run;
Thank you for your reply Jeff. I tried the code and it works wonderfully.
But, I wanted this calculation to go By ID and it seems to looks like this continues to include all the dates. I tried adding By ID and doesn't work. Is there any way to do this?
Sorry if I din't state this in my initial question.
Thank you for your time.
To handle by group processing you need to add BY statement. You also need to add logic to prevent it from checking dates from previous group. That part is a little harder when lagging by more than one since you can't just test FIRST.ID to know when to ignore the lagged value. You could use LAG() again to help.
data want (drop=cStartdate cEnddate dif2);
set have;
by id;
startdate=input(cstartdate,mmddyy12.);
enddate=input(cenddate,mmddyy12.);
format startdate date9. enddate date9.;
dif2 = lag2(startdate)-enddate;
if first.id or lag(first.id) then dif2=.;
if abs(dif2) <= 1800 and dif2 ne .;
run;
For the more general case create an observation counter.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.