Solved
Contributor
Posts: 30

Most recent two dates and difference in days across rows

I have a large dataset with rows of IDs and dates as below

ID  testdate

1  11/1/16

1  6/4/17

1  8/1/17

2  6/4/16

2  7/5/16

2  10/1/16

In the actual dataset, there are a lot more rows per person than I've shown above. I would like to pick the most recent two test dates for each person, and then create a new variable with the difference in days between those two dates.

I would really appreciate anyone's help!

Thank you very much!

KM

Accepted Solutions
Solution
‎11-01-2017 11:17 PM
PROC Star
Posts: 1,286

Re: Most recent two dates and difference in days across rows

like this?

``````data have;
input ID\$  testdate:ddmmyy10.;
format testdate ddmmyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
by ID testdate;
run;

data want;
set have;
by ID;
days_between=intck('day',lag1(testdate), testdate);
if last.id;
run;
``````

All Replies
Solution
‎11-01-2017 11:17 PM
PROC Star
Posts: 1,286

Re: Most recent two dates and difference in days across rows

like this?

``````data have;
input ID\$  testdate:ddmmyy10.;
format testdate ddmmyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
by ID testdate;
run;

data want;
set have;
by ID;
days_between=intck('day',lag1(testdate), testdate);
if last.id;
run;
``````
Posts: 4,743

Re: Most recent two dates and difference in days across rows

@m_o

Or as a variant to @draycuts code

``````data want;
set have;
by ID;
days_between=dif(testdate);
if last.id;
run;``````
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

Thank you so much for your reply! I ran your code as well and get the same issue. If you could see my reply to draycut, and advise me on how to address those issues, I would really appreciate it! Thank you!
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

Thank you so much! I have accepted the first responder's solution since he/she responded to the original question, but your solution works great as well! Thanks!
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

[ Edited ]

Thank you very much for your response!

I ran the code. It seems to be working for ID 2 but not for person ID 1.

The difference between 8/1/17 and 6/04/17 is 58 days but the code seems to be giving 88 days.

I was wondering if you are getting the same thing and if so how would one correct it.

Also, on a related topic, I was wondering if you could advise me on how to first subset the dataset so that the new dataset contains only the two most recent test dates per person.

Thank you very much!

Posts: 4,743

Re: Most recent two dates and difference in days across rows

@m_o

"The difference between 8/1/17 and 6/04/17 is 58 days but the code seems to be giving 88 days."

The code does the right thing. The explanation is that the dates are read into SAS using DDMMYYY and NOT US style MMDDYY. Change the informat in the sample code for reading the raw data and you'll get the expected result.

"so that the new dataset contains only the two most recent test dates per person."

Please provide a sample how the desired result should look like especially in regards of the values for column days_between.

Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

[ Edited ]

I see. Thank you! I would like the final dataset to look as shown below. If can first subset the dataset so that it only includes the two most recent test dates, then the next step of calculating days_between might use less memory, as my dataset is large

ID testdate days_between

1  6/4/17  88

1  8/1/17  88

2  7/5/16  31

2 10/1/16  31

I was wondering if you could advise me on an additional data step that would allow me to subset the dataset as I have described.

Thank you very much for your help!

Contributor
Posts: 30

Thank you again!
Super User
Posts: 10,787

Re: Most recent two dates and difference in days across rows

```Assuming I understood what you mean.

data have;
input ID\$  testdate:mmddyy10.;
format testdate mmddyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
by ID descending testdate ;
run;
data want;
set have;
by id;
if first.id then n=0;
n+1;
if n<3 then output;
drop n;
run;

```
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

That works great. Thank you so much!!
Contributor
Posts: 30

Re: Most recent two dates and difference in days across rows

Thank you again!!
☑ This topic is solved.