DATA Step, Macro, Functions and more

Most recent two dates and difference in days across rows

Accepted Solution Solved
Reply
Contributor m_o
Contributor
Posts: 30
Accepted Solution

Most recent two dates and difference in days across rows

I have a large dataset with rows of IDs and dates as below

ID  testdate

1  11/1/16

1  6/4/17

1  8/1/17

2  6/4/16

2  7/5/16

2  10/1/16

 

In the actual dataset, there are a lot more rows per person than I've shown above. I would like to pick the most recent two test dates for each person, and then create a new variable with the difference in days between those two dates. 

 

I would really appreciate anyone's help!

 

Thank you very much!

 

KM


Accepted Solutions
Solution
3 weeks ago
PROC Star
Posts: 765

Re: Most recent two dates and difference in days across rows

like this?

 

data have;
input ID$  testdate:ddmmyy10.;
format testdate ddmmyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
   by ID testdate;
run;

data want;
   set have;
   by ID;
   days_between=intck('day',lag1(testdate), testdate);
   if last.id;
run;

View solution in original post


All Replies
Solution
3 weeks ago
PROC Star
Posts: 765

Re: Most recent two dates and difference in days across rows

like this?

 

data have;
input ID$  testdate:ddmmyy10.;
format testdate ddmmyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
   by ID testdate;
run;

data want;
   set have;
   by ID;
   days_between=intck('day',lag1(testdate), testdate);
   if last.id;
run;
Respected Advisor
Posts: 4,173

Re: Most recent two dates and difference in days across rows

@m_o

Or as a variant to @draycuts code

data want;
   set have;
   by ID;
   days_between=dif(testdate);
   if last.id;
run;
Contributor m_o
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

Thank you so much for your reply! I ran your code as well and get the same issue. If you could see my reply to draycut, and advise me on how to address those issues, I would really appreciate it! Thank you!
Contributor m_o
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

Thank you so much! I have accepted the first responder's solution since he/she responded to the original question, but your solution works great as well! Smiley Happy Thanks!
Contributor m_o
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

[ Edited ]

Thank you very much for your response!

 

I ran the code. It seems to be working for ID 2 but not for person ID 1.

The difference between 8/1/17 and 6/04/17 is 58 days but the code seems to be giving 88 days. 

I was wondering if you are getting the same thing and if so how would one correct it. 

 

Also, on a related topic, I was wondering if you could advise me on how to first subset the dataset so that the new dataset contains only the two most recent test dates per person.

 

Thank you very much!

Respected Advisor
Posts: 4,173

Re: Most recent two dates and difference in days across rows

@m_o

"The difference between 8/1/17 and 6/04/17 is 58 days but the code seems to be giving 88 days."

The code does the right thing. The explanation is that the dates are read into SAS using DDMMYYY and NOT US style MMDDYY. Change the informat in the sample code for reading the raw data and you'll get the expected result.

 

"so that the new dataset contains only the two most recent test dates per person."

Please provide a sample how the desired result should look like especially in regards of the values for column days_between.

Contributor m_o
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

[ Edited ]

I see. Thank you! I would like the final dataset to look as shown below. If can first subset the dataset so that it only includes the two most recent test dates, then the next step of calculating days_between might use less memory, as my dataset is large

 

ID testdate days_between

1  6/4/17  88

1  8/1/17  88

2  7/5/16  31

2 10/1/16  31

 

I was wondering if you could advise me on an additional data step that would allow me to subset the dataset as I have described.

Thank you very much for your help!

Contributor m_o
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

Thank you again!
Super User
Posts: 10,044

Re: Most recent two dates and difference in days across rows

Assuming I understood what you mean.


data have;
input ID$  testdate:mmddyy10.;
format testdate mmddyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
   by ID descending testdate ;
run;
data want;
 set have;
 by id;
 if first.id then n=0;
 n+1;
 if n<3 then output;
 drop n;
run;


Contributor m_o
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

That works great. Thank you so much!!
Contributor m_o
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

Thank you again!!
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 191 views
  • 6 likes
  • 4 in conversation