I am trying to write code that pulls in data related to a program where kids receive services. If the time between services is 30 days or less than it is a follow-up, but if it is over 30 days it is a re-referral. I am trying to keep only the first date of referral and re-referral dates and not any of the follow-up dates.
Here are my data:
data example;
input id date date9. ;
datalines;
JS6749 15mar2019
JS6749 15Jun2020
JS6749 01Jul2020
JS6749 15Jul2020
JS6749 01Aug2020
JS6749 30Jan2021
JS4524 15May2020
JS4524 30May2020
JS4524 01Jun2020
JS4524 10Jun2020
;
run;
/*code I have so far that does not work to keep the correct dates*/
data fixcode (keep=id date numdays newdate);
set example;
if id = lag(patient_id_number)then do
numdays=date-lag(date);
if numdays>30 then newdate=date; else newdate=lag(date);
end;
if id ne lag(id) then newdate=date;
format newdate mmddyy10.;
run;
/*What I want the date to look like when I am done*/
ID | date | numdays | newdate |
JS6749 | 15-Mar-2019 | 3/15/2019 | |
JS6749 | 15-Jun-2020 | 458 | 6/15/2020 |
JS6749 | 1-Jul-2020 | 16 | 6/15/2020 |
JS6749 | 15-Jul-2020 | 14 | 6/15/2020 |
JS6749 | 1-Aug-2020 | 17 | 6/15/2020 |
JS6749 | 30-Jan-2021 | 182 | 1/30/2021 |
JS4524 | 15-May-2020 | 5/15/2020 | |
JS4524 | 30-May-2020 | 15 | 5/15/2020 |
JS4524 | 1-Jun-2020 | 2 | 5/15/2020 |
JS4524 | 10-Jun-2020 | 9 | 5/15/2020 |
Two issues, first minor: your ID variable in the Example data needs to be read as character.
Second, and a bit more worrisome is your lag(patient_id_number) . Your example data does not have a "patient_id_number" variable so your condition is never going to be true. Also, LAG after an IF seldom works as expected because of the queue nature of the Lag and Dif functions.
This does what I think you are asking:
data example; input id $ date date9. ; format date date9.; datalines; JS6749 15mar2019 JS6749 15Jun2020 JS6749 01Jul2020 JS6749 15Jul2020 JS6749 01Aug2020 JS6749 30Jan2021 JS4524 15May2020 JS4524 30May2020 JS4524 01Jun2020 JS4524 10Jun2020 ; run; data want; set example; by notsorted id; retain newdate; numdays= dif(date); if first.id then do; numdays=.; newdate= date; end; format newdate mmddyy10.; if numdays>30 then newdate=date; run;
When you use BY statement then SAS supplies automatic variables that indicate whether the current record is the first or last of the group. These are accessed using FIRST.variable and LAST.variable, they are numeric 1/0 which SAS uses for true/false so you can do things conditionally. The BY statement requires data to be sorted by default, your example data wasn't so the NOTSORTED option works with the example and expects data to be grouped by ID.
RETAIN will keep the values of a variable across the data step boundary. So you don't need all the If Lag values to reset.
@casmcfarland wrote:
I am trying to write code that pulls in data related to a program where kids receive services. If the time between services is 30 days or less than it is a follow-up, but if it is over 30 days it is a re-referral. I am trying to keep only the first date of referral and re-referral dates and not any of the follow-up dates.
Here are my data:
data example;
input id date date9. ;
datalines;
JS6749 15mar2019
JS6749 15Jun2020
JS6749 01Jul2020
JS6749 15Jul2020
JS6749 01Aug2020
JS6749 30Jan2021
JS4524 15May2020
JS4524 30May2020
JS4524 01Jun2020
JS4524 10Jun2020
;
run;
/*code I have so far that does not work to keep the correct dates*/
data fixcode (keep=id date numdays newdate);
set example;
if id = lag(patient_id_number)then do
numdays=date-lag(date);
if numdays>30 then newdate=date; else newdate=lag(date);
end;
if id ne lag(id) then newdate=date;
format newdate mmddyy10.;
run;
/*What I want the date to look like when I am done*/
ID date numdays newdate JS6749 15-Mar-2019 3/15/2019 JS6749 15-Jun-2020 458 6/15/2020 JS6749 1-Jul-2020 16 6/15/2020 JS6749 15-Jul-2020 14 6/15/2020 JS6749 1-Aug-2020 17 6/15/2020 JS6749 30-Jan-2021 182 1/30/2021 JS4524 15-May-2020 5/15/2020 JS4524 30-May-2020 15 5/15/2020 JS4524 1-Jun-2020 2 5/15/2020 JS4524 10-Jun-2020 9 5/15/2020
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.