I'm having trouble figuring out how to create two new variables that depend on the most recent date. Let's say I have ID, dx (diagnosis), a prescribed med, and a start date.
data have;
input id dx $ med $ startdate :date9.;
format startdate date9.;
datalines;
1 a a 01JAN2020
1 a a 04APR2020
1 a a 21APR2020
1 a a 01JUL2020
2 a a 01FEB2020
2 a b 01JUL2020
2 a a 07JUL2020
3 b b 01JAN2020
3 b a 04APR2020
3 b a 01MAY2020
3 b a 15JUN2020
3 b a 01SEP2020
3 b b 02SEP2020
;
RUN;
For any id, when there is > 60 day gap from the preceding date, given the same dx and med, I want to call this is a "new" prescription. For the same dx and med <60 days from the most recent date I want to call it a "refill".
data want;
input id dx $ med $ startdate :date9. refill $ new $;
format startdate date9.;
datalines;
1 a a 01JAN2020 0 0
1 a a 04APR2020 0 1
1 a a 21APR2020 1 0
1 a a 01JUL2020 0 1
2 a a 01FEB2020 0 0
2 a a 07JUL2020 0 1
2 a b 01JUL2020 0 0
3 b a 04APR2020 0 0
3 b a 01MAY2020 1 0
3 b a 15JUN2020 1 0
3 b a 01SEP2020 0 1
3 b b 01JAN2020 0 0
3 b b 02SEP2020 0 1
;
RUN;
The code I've been trying to use is below.
proc sort data=have out=havesort;
by id dx med startdate;
run;
data want;
set havesort;
by id dx med;
IF FIRST.med THEN DO;
TEMP=startdate;
refill=0;
new=0;
END;
RETAIN TEMP refill new;
IF startdate>TEMP + 60 THEN new+1;
IF TEMP<startdate<=TEMP + 60 then refill+1;
TEMP=startdate;
drop temp;
run;
However, as you can see, this doesn't reset the 60 day counter to the most recent startdate. How can I do that to get what I want?
Let me know if that does not make sense. Thank you always for the help!
data have;
input id dx $ med $ startdate :date9.;
format startdate date9.;
datalines;
1 a a 01JAN2020
1 a a 04APR2020
1 a a 21APR2020
1 a a 01JUL2020
2 a a 01FEB2020
2 a b 01JUL2020
2 a a 07JUL2020
3 b b 01JAN2020
3 b a 04APR2020
3 b a 01MAY2020
3 b a 15JUN2020
3 b a 01SEP2020
3 b b 02SEP2020
;
RUN;
proc sort data=have;by id dx med startdate;run;
data want;
set have;
by id dx med;
refill=0;new=0;
if not first.med and .<dif(startdate)<60 then refill=1;
if not first.med and dif(startdate)>60 then new=1;
run;
proc sort data=have; by id dx; run;
data want;
set have;
by id dx;
length lastdt 4 refill new 3;
format lastdt date9.;
retain lastdt;
if first.dx then lastdt=.;
if startdate-lastdt>60 then new=1;
else if lastdt>. then refill=1;
lastdt=startdate;
run;
proc print data=want; run;
Thanks @quickbluefish ! I hadn't even thought to make a separate date column.
So, this worked with a subtle correction. Because the med type can vary and that is relevant to whether it was a new or refill for that particular med, I used the following to get what I needed:
proc sort data=have out=havesort;
by id dx med;
run;
data want;
set havesort;
by id dx med;
length lastdt 4 refill new 3;
format lastdt date9.;
retain lastdt;
if first.med then lastdt=.;
if startdate-lastdt>60 then new=1;
else if lastdt>. then refill=1;
lastdt=startdate;
run;
proc print data=want;
run;
data have;
input id dx $ med $ startdate :date9.;
format startdate date9.;
datalines;
1 a a 01JAN2020
1 a a 04APR2020
1 a a 21APR2020
1 a a 01JUL2020
2 a a 01FEB2020
2 a b 01JUL2020
2 a a 07JUL2020
3 b b 01JAN2020
3 b a 04APR2020
3 b a 01MAY2020
3 b a 15JUN2020
3 b a 01SEP2020
3 b b 02SEP2020
;
RUN;
proc sort data=have;by id dx med startdate;run;
data want;
set have;
by id dx med;
refill=0;new=0;
if not first.med and .<dif(startdate)<60 then refill=1;
if not first.med and dif(startdate)>60 then new=1;
run;
Thank you @Ksharp . I think this is the cleanest solution! I added "<=60" for the refill =1 and it got me what I wanted.
Question, about the dif function, as I'm not familiar with that... In your code I'm assuming the difference is being taken between the preceding startdate and the current startdate because of the "if not first.med" statement. Is that correct?
if not first.med and .<dif(startdate)<60 then refill=1;
proc sort data=have out=need;
by id dx med startdate;
run;
data want;
set need;
by id dx med;
refill=(first.med=0 and lag(startdate)>=startdate-60);
new=(first.med=0 and lag(startdate)<startdate-60);
run;
This is another good option, thank you! This one doesn't feel as intuitive to me due to the "lag" function. I'll have to get more comfortable with it.
I did find the SAS documentation has a good example here.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.