BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
m_o
Obsidian | Level 7 m_o
Obsidian | Level 7

I have a large dataset with rows of IDs and dates as below

ID  testdate

1  11/1/16

1  6/4/17

1  8/1/17

2  6/4/16

2  7/5/16

2  10/1/16

 

In the actual dataset, there are a lot more rows per person than I've shown above. I would like to pick the most recent two test dates for each person, and then create a new variable with the difference in days between those two dates. 

 

I would really appreciate anyone's help!

 

Thank you very much!

 

KM

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

like this?

 

data have;
input ID$  testdate:ddmmyy10.;
format testdate ddmmyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
   by ID testdate;
run;

data want;
   set have;
   by ID;
   days_between=intck('day',lag1(testdate), testdate);
   if last.id;
run;

View solution in original post

11 REPLIES 11
PeterClemmensen
Tourmaline | Level 20

like this?

 

data have;
input ID$  testdate:ddmmyy10.;
format testdate ddmmyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
   by ID testdate;
run;

data want;
   set have;
   by ID;
   days_between=intck('day',lag1(testdate), testdate);
   if last.id;
run;
Patrick
Opal | Level 21

@m_o

Or as a variant to @PeterClemmensens code

data want;
   set have;
   by ID;
   days_between=dif(testdate);
   if last.id;
run;
m_o
Obsidian | Level 7 m_o
Obsidian | Level 7
Thank you so much for your reply! I ran your code as well and get the same issue. If you could see my reply to draycut, and advise me on how to address those issues, I would really appreciate it! Thank you!
m_o
Obsidian | Level 7 m_o
Obsidian | Level 7
Thank you so much! I have accepted the first responder's solution since he/she responded to the original question, but your solution works great as well! 🙂 Thanks!
m_o
Obsidian | Level 7 m_o
Obsidian | Level 7

Thank you very much for your response!

 

I ran the code. It seems to be working for ID 2 but not for person ID 1.

The difference between 8/1/17 and 6/04/17 is 58 days but the code seems to be giving 88 days. 

I was wondering if you are getting the same thing and if so how would one correct it. 

 

Also, on a related topic, I was wondering if you could advise me on how to first subset the dataset so that the new dataset contains only the two most recent test dates per person.

 

Thank you very much!

Patrick
Opal | Level 21

@m_o

"The difference between 8/1/17 and 6/04/17 is 58 days but the code seems to be giving 88 days."

The code does the right thing. The explanation is that the dates are read into SAS using DDMMYYY and NOT US style MMDDYY. Change the informat in the sample code for reading the raw data and you'll get the expected result.

 

"so that the new dataset contains only the two most recent test dates per person."

Please provide a sample how the desired result should look like especially in regards of the values for column days_between.

m_o
Obsidian | Level 7 m_o
Obsidian | Level 7

I see. Thank you! I would like the final dataset to look as shown below. If can first subset the dataset so that it only includes the two most recent test dates, then the next step of calculating days_between might use less memory, as my dataset is large

 

ID testdate days_between

1  6/4/17  88

1  8/1/17  88

2  7/5/16  31

2 10/1/16  31

 

I was wondering if you could advise me on an additional data step that would allow me to subset the dataset as I have described.

Thank you very much for your help!

Ksharp
Super User
Assuming I understood what you mean.


data have;
input ID$  testdate:mmddyy10.;
format testdate mmddyy10.;
datalines;
1  11/01/16
1  06/04/17
1  08/01/17
2  06/04/16
2  07/05/16
2  10/01/16
;

proc sort data=have;
   by ID descending testdate ;
run;
data want;
 set have;
 by id;
 if first.id then n=0;
 n+1;
 if n<3 then output;
 drop n;
run;


m_o
Obsidian | Level 7 m_o
Obsidian | Level 7
That works great. Thank you so much!!
m_o
Obsidian | Level 7 m_o
Obsidian | Level 7
Thank you again!!

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1434 views
  • 6 likes
  • 4 in conversation