BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sofia_de_garay
Fluorite | Level 6

I am working with a long dataset of drug treatment where people can start/stop a drug and have a dose reduced/increased. Everyone in the cohort starts on the drug (drug "A"), which was part of the inclusion criteria, and are followed over 4 or 5 months. I am trying to characterize changes in drug “A” use by creating new variables that flag when/if the drug was discontinued (column 6 in want) and when/if restarted (column 7 in want) and when/if the dose was reduced (column 4 in want) and increased (column 5 in want).  The true data set is of course much more complicated, but I really need help determining the formative code. Thank you so much in advance for your help! 

 

Have (note: system missing .  in the variable Drug_A_dose means that the drug was not taken, so 500-> . means discontinued and . ->500 or any other dose means restarted)

id

Month

Drug_A_dose

1

1

500

1

2

500

1

3

.

1

4

.

2

1

500

2

2

500

2

3

500

2

4

250

2

5

250

3

1

500

3

2

250

3

3

250

3

4

500

3

5

100

4

1

500

4

2

.

4

3

.

4

4

500

 

Want (columns 4-7):

id

Month

Drug_A_dose

Drug_A_Reduce

Drug_A_Increase

Drug_A_discontinue

Drug_A_restart

1

1

500

.

.

.

.

1

2

500

.

.

.

.

1

3

.

.

.

1

.

1

4

.

.

.

.

.

2

1

500

.

.

.

.

2

2

500

.

.

.

.

2

3

500

.

.

.

.

2

4

250

1

.

.

.

2

5

250

.

.

.

.

3

1

500

.

.

.

.

3

2

250

1

.

.

.

3

3

250

.

.

.

.

3

4

500

.

1

.

.

3

5

100

1

.

.

.

4

1

500

.

.

.

.

4

2

.

.

.

1

.

4

3

.

.

.

.

.

4

4

500

.

.

.

1

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a case in which you need to have the lagged value of dose in hand (_last_dose below).  Then, for any but the first observation for an ID, you can test the values of current dose and last dose to generate the four dummy variables of interest:  

 

data have;
  input id Month  Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
run;
data want;
  set have;
  by id;
  
  _last_dose=lag(drug_a_dose);
  
  if first.id=0 then do;
    if . < drug_a_dose < _last_dose     then drug_a_reduce=1;      else
    if drug_a_dose > _last_dose > .     then drug_a_increase=1;    else
    if _last_dose^=. and drug_a_dose= . then drug_a_discontinue=1; else
    if _last_dose=. and drug_a_dose^= . then drug_a_restart=1;
  end;
run;

But why not make a single result variable, coded from 1 to 4 (1=reduce, 2=increase, 3=discontinue, 4=restart).  This would probably be better for most types of subsequent analysis procedures.  (Note forgotten line of code now where it belongs).

 

data want;
  set have;
  by id;
  _last_dose=lag(drug_a_dose);  /* Forgot to put this in*/
  if first.id=0 then do;
    if . < drug_a_dose < _last_dose     then drug_a_change=1; else
    if drug_a_dose > _last_dose > .     then drug_a_change=2; else
    if _last_dose^=. and drug_a_dose= . then drug_a_change=3; else
    if _last_dose=. and drug_a_dose^= . then drug_a_change=4;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

This is a case in which you need to have the lagged value of dose in hand (_last_dose below).  Then, for any but the first observation for an ID, you can test the values of current dose and last dose to generate the four dummy variables of interest:  

 

data have;
  input id Month  Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
run;
data want;
  set have;
  by id;
  
  _last_dose=lag(drug_a_dose);
  
  if first.id=0 then do;
    if . < drug_a_dose < _last_dose     then drug_a_reduce=1;      else
    if drug_a_dose > _last_dose > .     then drug_a_increase=1;    else
    if _last_dose^=. and drug_a_dose= . then drug_a_discontinue=1; else
    if _last_dose=. and drug_a_dose^= . then drug_a_restart=1;
  end;
run;

But why not make a single result variable, coded from 1 to 4 (1=reduce, 2=increase, 3=discontinue, 4=restart).  This would probably be better for most types of subsequent analysis procedures.  (Note forgotten line of code now where it belongs).

 

data want;
  set have;
  by id;
  _last_dose=lag(drug_a_dose);  /* Forgot to put this in*/
  if first.id=0 then do;
    if . < drug_a_dose < _last_dose     then drug_a_change=1; else
    if drug_a_dose > _last_dose > .     then drug_a_change=2; else
    if _last_dose^=. and drug_a_dose= . then drug_a_change=3; else
    if _last_dose=. and drug_a_dose^= . then drug_a_change=4;
  end;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sofia_de_garay
Fluorite | Level 6

Thank you very much!  This works perfectly. Great idea to just combine into one variable since the categories are mutually exclusive. Sometimes we need others to point out the obvious 😉

Ksharp
Super User
data have;
  input id Month  Drug_A_dose ;
datalines;
1 1 500
1 2 500
1 3 .
1 4 .
2 1 500
2 2 500
2 3 500
2 4 250
2 5 250
3 1 500
3 2 250
3 3 250
3 4 500
3 5 100
4 1 500
4 2 .
4 3 .
4 4 500
;
data want;
 set have;
 if id=lag(id) and .<dif(Drug_A_dose)<0 then Drug_A_Reduce=1;
 if id=lag(id) and   dif(Drug_A_dose)>0 then Drug_A_Increase=1;
 if id=lag(id) and missing(Drug_A_dose) and not missing(lag(Drug_A_dose)) then Drug_A_Discontinue=1;
 if id=lag(id) and not missing(Drug_A_dose) and  missing(lag(Drug_A_dose)) then Drug_A_Restart=1;
run;
sofia_de_garay
Fluorite | Level 6

Thanks so much for providing this code, which also works to answer my question. As I am learning, it is helpful to understand multiple ways to approach a task. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 983 views
  • 4 likes
  • 3 in conversation