Hello Community,
I am trying to clean a dataset for a Cox regression analysis that I would greatly appreciate some help with. Basically, I am interested in running a Cox regression analysis to examine factors associated with the discontinuation of a certain medication prior to 6 months (180 days). I have provided an example below of what my dataset looks like: variable “days” represents the cumulative number of days from the index prescription date to the most recent prescription date, while variable “days_int” represents the number of days since the previous prescription.
A patient would be censored if the number of cumulative days since the first prescription date is less than 180 days (“Days”<180) or if a patient had 90 days or more in between prescriptions (“Days_int” ≥ 90) prior to 180 days since their first prescription.
I have also provided below an example of my desired output dataset. Please note that if a patient is censored because of the ≥90 days between prescriptions rule, the value for variable “Days” in the output dataset should be the “days” value in the record prior to the censoring event (see Patient_ID= 2 and Patient_ID=3). The value for “Dose” should also follow this rule for this case.
If a patient has received at least 180 days of the prescription (“Days” ≥180) with less than 90 days in between prescriptions during the first 180 days since the first prescription, the value for Days in the output dataset should be ‘180’ (i.e., to indicate that they received at least 180 days of the prescription). The value for “Dose” in this case should be the value in the last record prior to days ≥ 180. Please let me know if I can provide any further clarification. Any help with how to program this would be greatly appreciated!
Have:
Patient_ID | Dose | Med_start_date | Days | Days_int |
1 | 2 | 01JAN2015 | 0 | . |
1 | 4 | 16JAN2015 | 15 | 15 |
2 | 2 | 01JAN2010 | 0 | . |
2 | 4 | 11APR2010 | 100 | 100 |
3 | 8 | 01JAN2012 | 0 | . |
3 | 16 | 31JAN2012 | 30 | 30 |
3 | 24 | 10MAY2012 | 130 | 100 |
4 | 8 | 01MAR2017 | 0 | . |
4 | 8 | 30APR2017 | 60 | 60 |
4 | 8 | 29JUN2017 | 120 | 60 |
4 | 16 | 28AUG2017 | 180 | 60 |
4 | 16 | 27OCT2017 | 240 | 60 |
5 | 8 | 01FEB2018 | 0 | . |
5 | 8 | 02APR2018 | 60 | 60 |
5 | 16 | 01JUN2018 | 120 | 60 |
5 | 16 | 21JUL2018 | 170 | 50 |
5 | 32 | 20AUG2018 | 200 | 30 |
5 | 32 | 18NOV2018 | 300 | 90 |
6 | 1 | 01JAN2011 | 0 | . |
6 | 2 | 20FEB2011 | 50 | 50 |
6 | 2 | 21APR2011 | 110 | 60 |
6 | 4 | 20JUN2011 | 170 | 60 |
6 | 6 | 18SEP2011 | 260 | 90 |
7 | 8 | 01JAN2019 | 0 | . |
7 | 16 | 28OCT2019 | 300 | 300 |
Want:
Patient_ID | Dose | Days | Censor |
1 | 4 | 15 | 1 |
2 | 2 | 0 | 1 |
3 | 16 | 30 | 1 |
4 | 8 | 180 | 0 |
5 | 16 | 180 | 0 |
6 | 4 | 170 | 1 |
7 | 8 | 0 | 1 |
You have two types of censoring:
1a: You run out of observations before days reaches 180, while days_int never reaches 90. In this case you write out the last observation as is, with CENSOR=1.
1b: You reach days_int>=90 before (or simulataneous with) days reaching 180. In this case, you write out the preceding days and dose, again with censor=1
CENSOR=0 when you reach 180 days without already reaching days_int>=90. In this case write out prior dose but set days=180.
data have;
input Patient_ID Dose Med_start_date :date9. Days Days_int;
format med_start_date date9.;
datalines;
1 2 01JAN2015 0 .
1 4 16JAN2015 15 15
2 2 01JAN2010 0 .
2 4 11APR2010 100 100
3 8 01JAN2012 0 .
3 16 31JAN2012 30 30
3 24 10MAY2012 130 100
4 8 01MAR2017 0 .
4 8 30APR2017 60 60
4 8 29JUN2017 120 60
4 16 28AUG2017 180 60
4 16 27OCT2017 240 60
5 8 01FEB2018 0 .
5 8 02APR2018 60 60
5 16 01JUN2018 120 60
5 16 21JUL2018 170 50
5 32 20AUG2018 200 30
5 32 18NOV2018 300 90
6 1 01JAN2011 0 .
6 2 20FEB2011 50 50
6 2 21APR2011 110 60
6 4 20JUN2011 170 60
6 6 18SEP2011 260 90
7 8 01JAN2019 0 .
7 16 28OCT2019 300 300
run;
data want (keep=patient_id censor days dose);
do until (last.patient_id);
set have;
by patient_id;
_lag_dose=lag(dose);
_lag_days=lag(days);
if censor^=. then continue; /*Already output, skip subsequent iterations*/
if days_int>=90 then do;
censor=1;
days=_lag_days;
dose=_lag_dose;
end;
else if days>=180 then do;
censor=0;
days=180;
dose=_lag_dose;
end;
else if last.patient_id then censor=1;
if censor^=. then output;
end;
run;
You have two types of censoring:
1a: You run out of observations before days reaches 180, while days_int never reaches 90. In this case you write out the last observation as is, with CENSOR=1.
1b: You reach days_int>=90 before (or simulataneous with) days reaching 180. In this case, you write out the preceding days and dose, again with censor=1
CENSOR=0 when you reach 180 days without already reaching days_int>=90. In this case write out prior dose but set days=180.
data have;
input Patient_ID Dose Med_start_date :date9. Days Days_int;
format med_start_date date9.;
datalines;
1 2 01JAN2015 0 .
1 4 16JAN2015 15 15
2 2 01JAN2010 0 .
2 4 11APR2010 100 100
3 8 01JAN2012 0 .
3 16 31JAN2012 30 30
3 24 10MAY2012 130 100
4 8 01MAR2017 0 .
4 8 30APR2017 60 60
4 8 29JUN2017 120 60
4 16 28AUG2017 180 60
4 16 27OCT2017 240 60
5 8 01FEB2018 0 .
5 8 02APR2018 60 60
5 16 01JUN2018 120 60
5 16 21JUL2018 170 50
5 32 20AUG2018 200 30
5 32 18NOV2018 300 90
6 1 01JAN2011 0 .
6 2 20FEB2011 50 50
6 2 21APR2011 110 60
6 4 20JUN2011 170 60
6 6 18SEP2011 260 90
7 8 01JAN2019 0 .
7 16 28OCT2019 300 300
run;
data want (keep=patient_id censor days dose);
do until (last.patient_id);
set have;
by patient_id;
_lag_dose=lag(dose);
_lag_days=lag(days);
if censor^=. then continue; /*Already output, skip subsequent iterations*/
if days_int>=90 then do;
censor=1;
days=_lag_days;
dose=_lag_dose;
end;
else if days>=180 then do;
censor=0;
days=180;
dose=_lag_dose;
end;
else if last.patient_id then censor=1;
if censor^=. then output;
end;
run;
Thank you very much @mkeintz!! This is perfect!
@mkeintzanother issue I'm running into for this analysis of my actual dataset is that sometimes patients will have missing data for the preceding 'Dose' value. Therefore, I would like to implement an imputation procedure where in this case, the first dose that isn't missing prior to censoring is written out instead. I have provided a couple of examples below for the example dataset used previously. Would be able to suggest how to modify the code you previously provided to do this? Thank you very much!
Want:
Patient_ID | Dose | Med_start_date | Days | Days_int |
1 | 2 | 01JAN2015 | 0 | . |
1 | 4 | 16JAN2015 | 15 | 15 |
2 | 2 | 01JAN2010 | 0 | . |
2 | 4 | 11APR2010 | 100 | 100 |
3 | 8 | 01JAN2012 | 0 | . |
3 | 16 | 31JAN2012 | 30 | 30 |
3 | 24 | 10MAY2012 | 130 | 100 |
4 | 8 | 01MAR2017 | 0 | . |
4 | 8 | 30APR2017 | 60 | 60 |
4 | 8 | 29JUN2017 | 120 | 60 |
4 | 16 | 28AUG2017 | 180 | 60 |
4 | 16 | 27OCT2017 | 240 | 60 |
5 | 8 | 01FEB2018 | 0 | . |
5 | 8 | 02APR2018 | 60 | 60 |
5 | 16 | 01JUN2018 | 120 | 60 |
5 | 16 | 21JUL2018 | 170 | 50 |
5 | 32 | 20AUG2018 | 200 | 30 |
5 | 32 | 18NOV2018 | 300 | 90 |
6 | 1 | 01JAN2011 | 0 | . |
6 | 2 | 20FEB2011 | 50 | 50 |
6 | 2 | 21APR2011 | 110 | 60 |
6 | 4 | 20JUN2011 | 170 | 60 |
6 | 6 | 18SEP2011 | 260 | 90 |
7 | 8 | 01JAN2019 | 0 | . |
7 | 16 | 28OCT2019 | 300 | 300 |
8 | 8 | 01JAN2017 | 0 | . |
8 | . | 20FEB2017 | 50 | 50 |
9 | 8 | 01FEB2018 | 0 | . |
9 | 16 | 02APR2018 | 60 | 60 |
9 | . | 01JUN2018 | 120 | 60 |
9 | . | 21JUL2018 | 170 | 50 |
9 | 32 | 20AUG2018 | 200 | 30 |
9 | 32 | 18NOV2018 | 300 | 90 |
Want:
Patient_ID | Dose | Days | Censor |
1 | 4 | 15 | 1 |
2 | 2 | 0 | 1 |
3 | 16 | 30 | 1 |
4 | 8 | 180 | 0 |
5 | 16 | 180 | 0 |
6 | 4 | 170 | 1 |
7 | 8 | 0 | 1 |
8 | 8 | 50 | 1 |
9 | 16 | 180 | 0 |
So you want the most recent non-missing value of dose. Then instead of
_lag_dose=lag(dose);
you can use
_lag_dose=ifn(lag(dose)^=.,lag(dose),_lag_dose);
if first.patient_id=1 then _lag_dose=.;
Think of the IFN function as analogous to the excel if function. It tests the first argument, and returns the second arg if the first is true. Otherwise it returns the third arg.
@mkeintz Thank you very much. It looks like this works perfectly except for the case of patient_id = 8 in my example dataset below. For cases like this, the _lag_dose value is not getting inserted into the 'dose' variable; instead, dose =. where it should be equal to 8 in this case (please see desired output below). Can you please let me know how the code can be modified to account for cases like this as well? I have provided my code below as well. Thanks again for all your help.
data have;
input Patient_ID Dose Med_start_date :date9. Days Days_int;
format med_start_date date9.;
datalines;
1 2 01JAN2015 0 .
1 4 16JAN2015 15 15
2 2 01JAN2010 0 .
2 4 11APR2010 100 100
3 8 01JAN2012 0 .
3 16 31JAN2012 30 30
3 24 10MAY2012 130 100
4 8 01MAR2017 0 .
4 8 30APR2017 60 60
4 8 29JUN2017 120 60
4 16 28AUG2017 180 60
4 16 27OCT2017 240 60
5 8 01FEB2018 0 .
5 8 02APR2018 60 60
5 16 01JUN2018 120 60
5 16 21JUL2018 170 50
5 32 20AUG2018 200 30
5 32 18NOV2018 300 90
6 1 01JAN2011 0 .
6 2 20FEB2011 50 50
6 2 21APR2011 110 60
6 4 20JUN2011 170 60
6 6 18SEP2011 260 90
7 8 01JAN2019 0 .
7 16 28OCT2019 300 300
8 8 01JAN2017 0 .
8 . 20FEB2017 50 50
9 8 01FEB2018 0 .
9 16 02APR2018 60 60
9 . 01JUN2018 120 60
9 . 21JUL2018 170 50
9 32 20AUG2018 200 30
9 32 18NOV2018 300 90
run;
data want (keep=patient_id censor days dose);
do until (last.patient_id);
set have;
by patient_id;
_lag_dose=ifn(lag(dose)^=.,lag(dose),_lag_dose);
if first.patient_id=1 then _lag_dose=.;
_lag_days=lag(days);
if censor^=. then continue;
if days_int>=90 then do;
censor=1;
days=_lag_days;
dose=_lag_dose;
end;
else if days>=180 then do;
censor=0;
days=180;
dose=_lag_dose;
end;
else if last.patient_id then censor=1;
if censor^=. then output;
end;
run;
Want:
Patient_ID | Dose | Days | Censor |
1 | 4 | 15 | 1 |
2 | 2 | 0 | 1 |
3 | 16 | 30 | 1 |
4 | 8 | 180 | 0 |
5 | 16 | 180 | 0 |
6 | 4 | 170 | 1 |
7 | 8 | 0 | 1 |
8 | 8 | 50 | 1 |
9 | 16 | 180 | 0 |
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.