I am working with a long dataset of drug treatment where people can start/stop a drug and have a dose reduced/increased. Everyone in the cohort starts on the drug (drug "A"), which was part of the inclusion criteria, and are followed over 4 or 5 months. I am trying to characterize changes in drug “A” use by creating new variables that flag when/if the drug was discontinued (column 6 in want) and when/if restarted (column 7 in want) and when/if the dose was reduced (column 4 in want) and increased (column 5 in want). The true data set is of course much more complicated, but I really need help determining the formative code. Thank you so much in advance for your help!
Have (note: system missing . in the variable Drug_A_dose means that the drug was not taken, so 500-> . means discontinued and . ->500 or any other dose means restarted)
id | Month | Drug_A_dose |
1 | 1 | 500 |
1 | 2 | 500 |
1 | 3 | . |
1 | 4 | . |
2 | 1 | 500 |
2 | 2 | 500 |
2 | 3 | 500 |
2 | 4 | 250 |
2 | 5 | 250 |
3 | 1 | 500 |
3 | 2 | 250 |
3 | 3 | 250 |
3 | 4 | 500 |
3 | 5 | 100 |
4 | 1 | 500 |
4 | 2 | . |
4 | 3 | . |
4 | 4 | 500 |
Want (columns 4-7):
id | Month | Drug_A_dose | Drug_A_Reduce | Drug_A_Increase | Drug_A_discontinue | Drug_A_restart |
1 | 1 | 500 | . | . | . | . |
1 | 2 | 500 | . | . | . | . |
1 | 3 | . | . | . | 1 | . |
1 | 4 | . | . | . | . | . |
2 | 1 | 500 | . | . | . | . |
2 | 2 | 500 | . | . | . | . |
2 | 3 | 500 | . | . | . | . |
2 | 4 | 250 | 1 | . | . | . |
2 | 5 | 250 | . | . | . | . |
3 | 1 | 500 | . | . | . | . |
3 | 2 | 250 | 1 | . | . | . |
3 | 3 | 250 | . | . | . | . |
3 | 4 | 500 | . | 1 | . | . |
3 | 5 | 100 | 1 | . | . | . |
4 | 1 | 500 | . | . | . | . |
4 | 2 | . | . | . | 1 | . |
4 | 3 | . | . | . | . | . |
4 | 4 | 500 | . | . | . | 1 |
This is a case in which you need to have the lagged value of dose in hand (_last_dose below). Then, for any but the first observation for an ID, you can test the values of current dose and last dose to generate the four dummy variables of interest:
data have;
input id Month Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
run;
data want;
set have;
by id;
_last_dose=lag(drug_a_dose);
if first.id=0 then do;
if . < drug_a_dose < _last_dose then drug_a_reduce=1; else
if drug_a_dose > _last_dose > . then drug_a_increase=1; else
if _last_dose^=. and drug_a_dose= . then drug_a_discontinue=1; else
if _last_dose=. and drug_a_dose^= . then drug_a_restart=1;
end;
run;
But why not make a single result variable, coded from 1 to 4 (1=reduce, 2=increase, 3=discontinue, 4=restart). This would probably be better for most types of subsequent analysis procedures. (Note forgotten line of code now where it belongs).
data want;
set have;
by id;
_last_dose=lag(drug_a_dose); /* Forgot to put this in*/
if first.id=0 then do;
if . < drug_a_dose < _last_dose then drug_a_change=1; else
if drug_a_dose > _last_dose > . then drug_a_change=2; else
if _last_dose^=. and drug_a_dose= . then drug_a_change=3; else
if _last_dose=. and drug_a_dose^= . then drug_a_change=4;
end;
run;
This is a case in which you need to have the lagged value of dose in hand (_last_dose below). Then, for any but the first observation for an ID, you can test the values of current dose and last dose to generate the four dummy variables of interest:
data have;
input id Month Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
run;
data want;
set have;
by id;
_last_dose=lag(drug_a_dose);
if first.id=0 then do;
if . < drug_a_dose < _last_dose then drug_a_reduce=1; else
if drug_a_dose > _last_dose > . then drug_a_increase=1; else
if _last_dose^=. and drug_a_dose= . then drug_a_discontinue=1; else
if _last_dose=. and drug_a_dose^= . then drug_a_restart=1;
end;
run;
But why not make a single result variable, coded from 1 to 4 (1=reduce, 2=increase, 3=discontinue, 4=restart). This would probably be better for most types of subsequent analysis procedures. (Note forgotten line of code now where it belongs).
data want;
set have;
by id;
_last_dose=lag(drug_a_dose); /* Forgot to put this in*/
if first.id=0 then do;
if . < drug_a_dose < _last_dose then drug_a_change=1; else
if drug_a_dose > _last_dose > . then drug_a_change=2; else
if _last_dose^=. and drug_a_dose= . then drug_a_change=3; else
if _last_dose=. and drug_a_dose^= . then drug_a_change=4;
end;
run;
Thank you very much! This works perfectly. Great idea to just combine into one variable since the categories are mutually exclusive. Sometimes we need others to point out the obvious 😉
data have;
input id Month Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
;
data want;
set have;
if id=lag(id) and .<dif(Drug_A_dose)<0 then Drug_A_Reduce=1;
if id=lag(id) and dif(Drug_A_dose)>0 then Drug_A_Increase=1;
if id=lag(id) and missing(Drug_A_dose) and not missing(lag(Drug_A_dose)) then Drug_A_Discontinue=1;
if id=lag(id) and not missing(Drug_A_dose) and missing(lag(Drug_A_dose)) then Drug_A_Restart=1;
run;
Thanks so much for providing this code, which also works to answer my question. As I am learning, it is helpful to understand multiple ways to approach a task.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.