Hi Everyone,
I have data that follows cases over time. What I would like to do is:
1. Calculate the number of days between each session.
2. Calculate The number of datys in the program, so the difference between the first and last session.
I'm not sure what the best way to approach this is, so any ideas / suggestions would be greatly appreciated.
Thanks in advance!
ChildID | PreAdmHisActionDate_HW | FirstSession | LastSession | SHSession |
3 | 29-Jun-09 | 0 | 1 | Most Recent Session |
3 | 22-Dec-08 | 0 | 0 | 2nd Most Recent |
3 | 8-Sep-08 | 0 | 0 | 3rd Most Recent |
3 | 8-Jul-08 | 0 | 0 | 4th Most Recent |
3 | 11-Apr-08 | 0 | 0 | 5th Most Recent |
3 | 19-Feb-08 | 0 | 0 | 6rd Most Recent |
3 | 24-Jan-08 | 1 | 0 | 7th Most Recent |
4 | 10-Dec-10 | 0 | 1 | Most Recent Session |
4 | 23-Sep-10 | 0 | 0 | 2nd Most Recent |
4 | 9-Mar-10 | 0 | 0 | 3rd Most Recent |
4 | 1-Jul-09 | 0 | 0 | 4th Most Recent |
4 | 22-Dec-08 | 0 | 0 | 5th Most Recent |
4 | 7-Oct-08 | 0 | 0 | 6rd Most Recent |
4 | 9-Sep-08 | 0 | 0 | 7th Most Recent |
4 | 8-Jul-08 | 0 | 0 | 8th Most Recent |
4 | 10-Apr-08 | 0 | 0 | 9rd Most Recent |
4 | 25-Jan-08 | 0 | 0 | 10th Most Recent |
4 | 24-Jan-08 | 1 | 0 | 11th Most Recent |
5 | 22-Mar-10 | 0 | 1 | Most Recent Session |
5 | 22-Dec-09 | 0 | 0 | 2nd Most Recent |
5 | 20-Oct-09 | 0 | 0 | 3rd Most Recent |
5 | 16-Jun-09 | 0 | 0 | 4th Most Recent |
5 | 22-Dec-08 | 0 | 0 | 5th Most Recent |
5 | 3-Sep-08 | 0 | 0 | 6rd Most Recent |
5 | 17-Jun-08 | 0 | 0 | 7th Most Recent |
5 | 18-Mar-08 | 0 | 0 | 8th Most Recent |
5 | 27-Feb-08 | 0 | 0 | 9rd Most Recent |
5 | 27-Feb-08 | 1 | 0 | 9rd Most Recent |
9 | 14-Jul-09 | 0 | 1 | Most Recent Session |
9 | 19-Sep-08 | 0 | 0 | 2nd Most Recent |
9 | 8-Jul-08 | 0 | 0 | 3rd Most Recent |
9 | 8-Apr-08 | 0 | 0 | 4th Most Recent |
9 | 12-Feb-08 | 1 | 0 | 5th Most Recent |
10 | 17-Jul-09 | 0 | 1 | Most Recent Session |
10 | 22-Dec-08 | 0 | 0 | 2nd Most Recent |
10 | 7-Oct-08 | 0 | 0 | 3rd Most Recent |
10 | 14-Jul-08 | 0 | 0 | 4th Most Recent |
10 | 17-Apr-08 | 0 | 0 | 5th Most Recent |
10 | 15-Feb-08 | 1 | 0 | 6rd Most Recent |
11 | 30-Jun-09 | 0 | 1 | Most Recent Session |
11 | 22-Dec-08 | 0 | 0 | 2nd Most Recent |
11 | 2-Oct-08 | 0 | 0 | 3rd Most Recent |
11 | 2-Jul-08 | 0 | 0 | 4th Most Recent |
11 | 17-Mar-08 | 0 | 0 | 5th Most Recent |
11 | 11-Feb-08 | 1 | 0 | 6rd Most Recent |
One necessary condition is that your dates should be SAS dates, not character strings. That may already be taken care of, but I just wanted to point it out.
It's a much easier problem to handle if you reverse the order:
proc sort data=have;
by ChildId PreAdmHisActionDate_HW;
run;
Then the calculations are straightforward:
data want;
set have;
by ChildID;
session_diff = dif(PreAdmHisActionDate_HW);
if first.ChildID then do;
total_days=0;
session_diff=.;
end;
total_days + session_diff;
run;
The variable SESSION_DIFF is one of the variables you were asking for. The variable TOTAL_DAYS is a cumulative version of SESSION_DIFF for that ID. So the final value (when LastSession=1) would be the other variable you asking for.
One necessary condition is that your dates should be SAS dates, not character strings. That may already be taken care of, but I just wanted to point it out.
It's a much easier problem to handle if you reverse the order:
proc sort data=have;
by ChildId PreAdmHisActionDate_HW;
run;
Then the calculations are straightforward:
data want;
set have;
by ChildID;
session_diff = dif(PreAdmHisActionDate_HW);
if first.ChildID then do;
total_days=0;
session_diff=.;
end;
total_days + session_diff;
run;
The variable SESSION_DIFF is one of the variables you were asking for. The variable TOTAL_DAYS is a cumulative version of SESSION_DIFF for that ID. So the final value (when LastSession=1) would be the other variable you asking for.
Astounding - That worked perfectly, thanks so much! When you get a moment can you explain to me how the Diff function knew to take the next row and subtract or is that what the diff function just does? subtracts row by row?
That's what DIF is built to do ... subtract the previous value from the current value.
There are tricky aspects that almost always require that the DIF function executes on every observation. That's why the DIF value gets calculated but then re-set to missing on the earliest date for a ChildID.
Got it until that last part. What happens if there is missing data?
The problem is a little different. By executing on every observation, sometimes DIF calculates the difference between the first date for a ChildID and the last date for the previous ChildID. Logically, those values should not be included in your calculations. That's why the program sets them to missing.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.