Two issues, first minor: your ID variable in the Example data needs to be read as character.
Second, and a bit more worrisome is your lag(patient_id_number) . Your example data does not have a "patient_id_number" variable so your condition is never going to be true. Also, LAG after an IF seldom works as expected because of the queue nature of the Lag and Dif functions.
This does what I think you are asking:
data example;
input id $ date date9. ;
format date date9.;
datalines;
JS6749 15mar2019
JS6749 15Jun2020
JS6749 01Jul2020
JS6749 15Jul2020
JS6749 01Aug2020
JS6749 30Jan2021
JS4524 15May2020
JS4524 30May2020
JS4524 01Jun2020
JS4524 10Jun2020
;
run;
data want;
set example;
by notsorted id;
retain newdate;
numdays= dif(date);
if first.id then do;
numdays=.;
newdate= date;
end;
format newdate mmddyy10.;
if numdays>30 then newdate=date;
run;
When you use BY statement then SAS supplies automatic variables that indicate whether the current record is the first or last of the group. These are accessed using FIRST.variable and LAST.variable, they are numeric 1/0 which SAS uses for true/false so you can do things conditionally. The BY statement requires data to be sorted by default, your example data wasn't so the NOTSORTED option works with the example and expects data to be grouped by ID.
RETAIN will keep the values of a variable across the data step boundary. So you don't need all the If Lag values to reset.
@casmcfarland wrote:
I am trying to write code that pulls in data related to a program where kids receive services. If the time between services is 30 days or less than it is a follow-up, but if it is over 30 days it is a re-referral. I am trying to keep only the first date of referral and re-referral dates and not any of the follow-up dates.
Here are my data:
data example; input id date date9. ; datalines; JS6749 15mar2019 JS6749 15Jun2020 JS6749 01Jul2020 JS6749 15Jul2020 JS6749 01Aug2020 JS6749 30Jan2021 JS4524 15May2020 JS4524 30May2020 JS4524 01Jun2020 JS4524 10Jun2020 ; run;
/*code I have so far that does not work to keep the correct dates*/ data fixcode (keep=id date numdays newdate); set example; if id = lag(patient_id_number)then do numdays=date-lag(date); if numdays>30 then newdate=date; else newdate=lag(date); end; if id ne lag(id) then newdate=date; format newdate mmddyy10.; run;
/*What I want the date to look like when I am done*/
ID
date
numdays
newdate
JS6749
15-Mar-2019
3/15/2019
JS6749
15-Jun-2020
458
6/15/2020
JS6749
1-Jul-2020
16
6/15/2020
JS6749
15-Jul-2020
14
6/15/2020
JS6749
1-Aug-2020
17
6/15/2020
JS6749
30-Jan-2021
182
1/30/2021
JS4524
15-May-2020
5/15/2020
JS4524
30-May-2020
15
5/15/2020
JS4524
1-Jun-2020
2
5/15/2020
JS4524
10-Jun-2020
9
5/15/2020
... View more