Hi, I am trying to create a time-varying covariate that indicates whether someone was taking drug 1 (drugcat=0), drug 2 (drugcat=1), or drug 1+ drug 2 (drugcat=2). I want to run a Cox proportional hazards regression (proc phreg) that incorporates the fact that the drugs people are taking changes over time. Here is a simplified version of the data I have:
ID |
Drug1StartDate1 |
Drug1EndDate1 |
Drug1StartDate2 |
Drug1EndDate2 |
Drug2StartDate1 |
Drug2EndDate1 |
Drug2StartDate2 |
Drug2EndDate2 |
age |
sex |
censor |
1 |
June 1 2011 |
July 1 2011 |
|
|
|
|
|
|
50 |
0 |
0 |
2 |
Jan 1 2012 |
Dec 31 2012 |
|
|
June 1 2012 |
Dec 31 2012 |
|
|
40 |
1 |
1 |
3 |
Jan 1 2010 |
June 15 2010 |
July 1 2010 |
Dec 1 2010 |
Feb 1 2010 |
June 30 2010 |
Sept 15 2010 |
Dec 15 2010 |
60 |
1 |
0 |
I want the start and stop times in days that they are in each drug category. It’s not so hard when someone falls into only one category (ID=1) or two categories (ID=2) but it gets really tricky when people move between the categories (ID=3 is on only drug 1 from Jan 1-Jan 31, then on both drugs from Feb 1 to June 15, then on only drug 2 from June 16-30, then back to drug 1 from July 1 to Sept 14, then on both drugs from Sept 15 to Dec 1, then only on Drug 2 from Dec 2 to Dec 15. Some people have up to 20 start/stop dates so I can't really muscle through each possible variation.
Here is what I'd like my data to look like:
ID |
Start |
Stop |
Drugcat |
age |
sex |
censor |
1 |
0 |
30 |
0 |
50 |
0 |
0 |
2 |
0 |
152 |
0 |
40 |
1 |
0 |
2 |
152 |
365 |
2 |
40 |
1 |
1 |
3 |
0 |
31 |
0 |
60 |
1 |
0 |
3 |
31 |
167 |
2 |
60 |
1 |
0 |
3 |
167 |
182 |
1 |
60 |
1 |
0 |
3 |
182 |
257 |
0 |
60 |
1 |
0 |
3 |
257 |
334 |
2 |
60 |
1 |
0 |
3 |
334 |
365 |
1 |
60 |
1 |
0 |
Ultimately I’ll use this code to run my regression (where drugcat=0 means drug 1 alone, drugcat=1 means drug 2 alone and drugcat=3 means both drugs at the same time):
Proc phreg data=want;
Class drugcat;
Model (start, stop)*censor(0)=drugcat age sex/rl;
Run;
I’ve tried using arrays to get the data I want but it quickly falls apart when someone switches between the drug categories more than once.
Can anyone tell me a better way to get to my goal? Thanks in advance!
Hi - sounds like you need a "counting process" dataset as input for PHREG. Are you still looking for a way to create this? I have a macro for it - let me know.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.