Solved: Creating variables that flag changes over time for a medication

sofia_de_garay · Posted 03-06-2025 03:46 PM

I am working with a long dataset of drug treatment where people can start/stop a drug and have a dose reduced/increased. Everyone in the cohort starts on the drug (drug "A"), which was part of the inclusion criteria, and are followed over 4 or 5 months. I am trying to characterize changes in drug “A” use by creating new variables that flag when/if the drug was discontinued (column 6 in want) and when/if restarted (column 7 in want) and when/if the dose was reduced (column 4 in want) and increased (column 5 in want). The true data set is of course much more complicated, but I really need help determining the formative code. Thank you so much in advance for your help!

Have (note: system missing . in the variable Drug_A_dose means that the drug was not taken, so 500-> . means discontinued and . ->500 or any other dose means restarted)

id	Month	Drug_A_dose
1	1	500
1	2	500
1	3	.
1	4	.
2	1	500
2	2	500
2	3	500
2	4	250
2	5	250
3	1	500
3	2	250
3	3	250
3	4	500
3	5	100
4	1	500
4	2	.
4	3	.
4	4	500

Want (columns 4-7):

id	Month	Drug_A_dose	Drug_A_Reduce	Drug_A_Increase	Drug_A_discontinue	Drug_A_restart
1	1	500	.	.	.	.
1	2	500	.	.	.	.
1	3	.	.	.	1	.
1	4	.	.	.	.	.
2	1	500	.	.	.	.
2	2	500	.	.	.	.
2	3	500	.	.	.	.
2	4	250	1	.	.	.
2	5	250	.	.	.	.
3	1	500	.	.	.	.
3	2	250	1	.	.	.
3	3	250	.	.	.	.
3	4	500	.	1	.	.
3	5	100	1	.	.	.
4	1	500	.	.	.	.
4	2	.	.	.	1	.
4	3	.	.	.	.	.
4	4	500	.	.	.	1

mkeintz · Posted 03-06-2025 04:12 PM

This is a case in which you need to have the lagged value of dose in hand (_last_dose below). Then, for any but the first observation for an ID, you can test the values of current dose and last dose to generate the four dummy variables of interest:

data have;
  input id Month  Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
run;
data want;
  set have;
  by id;
  
  _last_dose=lag(drug_a_dose);
  
  if first.id=0 then do;
    if . < drug_a_dose < _last_dose     then drug_a_reduce=1;      else
    if drug_a_dose > _last_dose > .     then drug_a_increase=1;    else
    if _last_dose^=. and drug_a_dose= . then drug_a_discontinue=1; else
    if _last_dose=. and drug_a_dose^= . then drug_a_restart=1;
  end;
run;

But why not make a single result variable, coded from 1 to 4 (1=reduce, 2=increase, 3=discontinue, 4=restart). This would probably be better for most types of subsequent analysis procedures. (Note forgotten line of code now where it belongs).

data want;
  set have;
  by id;
  _last_dose=lag(drug_a_dose);  /* Forgot to put this in*/
  if first.id=0 then do;
    if . < drug_a_dose < _last_dose     then drug_a_change=1; else
    if drug_a_dose > _last_dose > .     then drug_a_change=2; else
    if _last_dose^=. and drug_a_dose= . then drug_a_change=3; else
    if _last_dose=. and drug_a_dose^= . then drug_a_change=4;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 03-06-2025 04:12 PM

This is a case in which you need to have the lagged value of dose in hand (_last_dose below). Then, for any but the first observation for an ID, you can test the values of current dose and last dose to generate the four dummy variables of interest:

data have;
  input id Month  Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
run;
data want;
  set have;
  by id;
  
  _last_dose=lag(drug_a_dose);
  
  if first.id=0 then do;
    if . < drug_a_dose < _last_dose     then drug_a_reduce=1;      else
    if drug_a_dose > _last_dose > .     then drug_a_increase=1;    else
    if _last_dose^=. and drug_a_dose= . then drug_a_discontinue=1; else
    if _last_dose=. and drug_a_dose^= . then drug_a_restart=1;
  end;
run;

But why not make a single result variable, coded from 1 to 4 (1=reduce, 2=increase, 3=discontinue, 4=restart). This would probably be better for most types of subsequent analysis procedures. (Note forgotten line of code now where it belongs).

data want;
  set have;
  by id;
  _last_dose=lag(drug_a_dose);  /* Forgot to put this in*/
  if first.id=0 then do;
    if . < drug_a_dose < _last_dose     then drug_a_change=1; else
    if drug_a_dose > _last_dose > .     then drug_a_change=2; else
    if _last_dose^=. and drug_a_dose= . then drug_a_change=3; else
    if _last_dose=. and drug_a_dose^= . then drug_a_change=4;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sofia_de_garay · Posted 03-07-2025 08:56 AM

Thank you very much! This works perfectly. Great idea to just combine into one variable since the categories are mutually exclusive. Sometimes we need others to point out the obvious 😉

Ksharp · Posted 03-06-2025 09:54 PM

data have;
  input id Month  Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
;
data want;
 set have;
 if id=lag(id) and .<dif(Drug_A_dose)<0 then Drug_A_Reduce=1;
 if id=lag(id) and   dif(Drug_A_dose)>0 then Drug_A_Increase=1;
 if id=lag(id) and missing(Drug_A_dose) and not missing(lag(Drug_A_dose)) then Drug_A_Discontinue=1;
 if id=lag(id) and not missing(Drug_A_dose) and  missing(lag(Drug_A_dose)) then Drug_A_Restart=1;
run;

sofia_de_garay · Posted 03-07-2025 08:57 AM

Thanks so much for providing this code, which also works to answer my question. As I am learning, it is helpful to understand multiple ways to approach a task.

Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Registration is open

Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Re: Creating variables that flag changes over time for a medication

Registration is open

SAS Training: Just a Click Away