BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shasank
Quartz | Level 8

Hi SAS community,

 

I would appreciate your help in below problem:

 

I have a scenario in which a data-set sorted by ID and startdate and Enddate.

 

Each ID has multiple rows with startdate and enddate. If any 3 consecutive rows(Bunch of 3) have a date difference of <=60 months (1800 days) then Output.

 

For Ex: If the nth row of ID #1213 has an enddate- 01/12/2018 and nth-3 row has startdate- 31/12/2019 then that row should be flagged. Here is a sample dataset and expected output.

 

Row 3 startdate was compared to row 1 enddate and difference is 478 days. <1800 days or 60 months 

Row 4 startdate was compared to row 2 enddate and difference is -30 days. <1800 days or 60 months 

Row 5 startdate was compared to row 3 enddate and difference is 200 days. <1800 days or 60 months 

Row 6 startdate was compared to row 4 enddate and difference is 1965 days. > 1800 days or 60 months

So this should not be in the output. 

 

Basically every startdate should be matched with n-3 end date and flagged for output. I tried to use lag function but it seems to only look for n-1.

Any help is appreciated. Thank you for your time and effort. 

 

IDStartdateEnddateDiff N and N -3
11221/12/20171/30/2017 
11224/3/20187/22/2018 
11225/23/20186/2/2018478
11226/22/20187/7/2018-30
112212/19/201812/31/2018200
112211/23/202312/9/20231965
11223/22/20244/13/20241908
    
    
Output   
11224324343253478
11224327343288-30
11224345343465200
1 ACCEPTED SOLUTION

Accepted Solutions
JeffMaggio
Obsidian | Level 7

Actually, based on your sample and desired output, you want n-2, which is lag2.

 

data have;
input ID	cStartdate $	cEnddate $	Diff;
cards;
1122 01/12/17 01/30/17 0	 
1122 04/03/18 07/22/18  0
1122 05/23/18 06/02/18 478
1122 06/22/18 07/07/18 -30
1122 12/19/18 12/31/18 200
1122 11/23/23 12/09/23 1965
1122 03/22/24 04/13/24 1908
;
run;

data want (drop=cStartdate cEnddate dif2);
set have;
startdate=input(cstartdate,mmddyy12.);
enddate=input(cenddate,mmddyy12.);
dif2 = lag2(startdate)-enddate;
format startdate date9. enddate date9.;
if abs(dif2) <= 1800 and dif2 ne .;

run;

View solution in original post

5 REPLIES 5
JeffMaggio
Obsidian | Level 7

Actually, based on your sample and desired output, you want n-2, which is lag2.

 

data have;
input ID	cStartdate $	cEnddate $	Diff;
cards;
1122 01/12/17 01/30/17 0	 
1122 04/03/18 07/22/18  0
1122 05/23/18 06/02/18 478
1122 06/22/18 07/07/18 -30
1122 12/19/18 12/31/18 200
1122 11/23/23 12/09/23 1965
1122 03/22/24 04/13/24 1908
;
run;

data want (drop=cStartdate cEnddate dif2);
set have;
startdate=input(cstartdate,mmddyy12.);
enddate=input(cenddate,mmddyy12.);
dif2 = lag2(startdate)-enddate;
format startdate date9. enddate date9.;
if abs(dif2) <= 1800 and dif2 ne .;

run;
shasank
Quartz | Level 8

Thank you for your reply Jeff. I tried the code and it works wonderfully.

 

But, I wanted this calculation to go By ID and it seems to looks like this continues to include all the dates. I tried adding By ID and doesn't work. Is there any way to do this?

 

Sorry if I din't state this in my initial question. 

 

Thank you for your time. 

 

Tom
Super User Tom
Super User

To handle by group processing you need to add BY statement.  You also need to add logic to prevent it from checking dates from previous group.  That part is a little harder when lagging by more than one since you can't just test FIRST.ID to know when to ignore the lagged value.  You could use LAG() again to help.

data want (drop=cStartdate cEnddate dif2);
  set have;
  by id;
  startdate=input(cstartdate,mmddyy12.);
  enddate=input(cenddate,mmddyy12.);
  format startdate date9. enddate date9.;

  dif2 = lag2(startdate)-enddate;
  if first.id or lag(first.id) then dif2=.;
  if abs(dif2) <= 1800 and dif2 ne .;
run;

For the more general case create an observation counter.

 

shasank
Quartz | Level 8
Thank you for the response. Worked like a charm.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 549 views
  • 2 likes
  • 3 in conversation