Hi,
My cox model includes a time-dependent variable. I tried different ways to define the variable in proc phreg procedure (after model statement) but got very different results. Please help me!
The first way, I defined 0 first and else=1 and got hazard ration of .47 which made sense.
if daystorept2-daystochcntc=. or daystorept2-daystochcntc>30 then cntc30chldadj=0; else cntc30chldadj=1;
The second way, i defined 1 first and else=0 and got HR=5.6 which totally did not make sense.
if 0<=daystorept2-daystochcntc <= 30 then cntc30chld=1; else cntc30chld=0;
I checked frequencies of the two variables and they are exactly the same. So why did I get so different results. Anyone know? How does SAS process the statement in proc phreg?
Thank you!
Hey Sophie,
Take a look at the illustration I have below. I think you may be having a discrepancy in which observations get grouped in your two methods.
data have(drop=i);
val=0;
do i=1 to 40;
if i le 5 then do;
val=.;
output;
end;
else do;
val=i-8;
output;
end;
end;
run;
proc format;
value opt_a
low-30 = 1
other = 0
;
value opt_b
0-30 = 1
other = 0
;
run;
data check;
set have;
chk1=input(put(val,opt_a.),8.);
chk2=input(put(val,opt_b.),8.);
run;
data issue_vals;
set check(where=(chk1 ne chk2));
run;
proc print data=issue_vals;
run;
Negative values will be in different buckets depending on the logic you use. Does this help?
-unison
Thank you very much for your help. To be honest, I am not really clear how this relates to my problem. I checked frequencies using two different definitions, and the frequencies were exactly the same. But the results of proc phreg precedure gives conflicting info. I am very confused. Could you explain more? Thank you!
To be clear, you did something like this?
proc freq data=option1; tables cntc30chld; run;
proc freq data=option2; tables cntc30chld; run;
By checking the frequency I was under the impression you did it for your set in total rather than by cntc30chld. What I was getting at in the above example is that the two bucketing methods yield different results if you have values below 0.
Can you provide sample data?
-unison
Yes and not a problem.
I edited my reply above to add more context to my example.
So the 2 bucketing methods yielded the same number of observations for cntc30chld=1 and cntc30chld=0?
What do daystorept2 and daystochcntc represent?
Okay that's helpful. Can you produce a sample set removing any sensitive information? It's difficult to reproduce the issue without the input. Also, any relevant options you are using in proc phreg.
Here's a great article that may streamline things:
Okay, with the data you provided, I wasn't able to reproduce the same issue. I adjusted the data to this:
data have(drop=obs);
input obs caseidno daystorept2 inrept2 daystochcntc;
datalines;
1 00010003 114 0 88
2 00011404 183 1 77
3 00011405 183 0 77
4 00021203 110 0 77
5 00021204 110 0 77
6 00021205 110 1 77
7 00021206 183 0 .
8 00021207 183 0 .
9 00021208 183 0 77
10 00021208 110 0 .
11 00021403 3 0 .
12 00021403 183 1 44
13 00021404 3 0 .
14 00021404 183 0 44
15 00021405 3 0 .
16 00021405 183 1 .
17 00021503 183 1 .
18 00021503 71 0 .
19 00021504 183 0 .
20 00021504 71 0 .
21 00021505 1 0 .
22 00021505 183 1 .
23 00021505 20 0 .
24 00031303 4 0 .
25 00031303 183 1 .
26 00031403 111 0 .
28 00031403 11 1 9
29 00031403 50 0 .
30 00031403 183 0 .
31 00031503 183 0 77
32 00031504 183 0 .
33 00031505 183 0 77
34 00031505 135 0 .
35 00031506 183 0 44
36 00031506 135 0 .
37 00051103 183 0 .
38 00051104 183 0 152
39 00051105 183 0 .
40 00051204 183 0 174
41 00051204 183 0 178
42 00051204 183 0 180
43 00051204 21 0 17
44 00051204 183 0 182
;
run;
And then ran:
data intermediate;
set have;
if daystorept2-daystochcntc=. or daystorept2-daystochcntc>30 then flag1=0;
else flag1=1;
if 0<=daystorept2-daystochcntc <= 30 then flag2=1;
else flag2=0;
run;
proc phreg data=intermediate covsandwich(aggregate);
ID caseidno;
model daystorept2*inrept2(0)=flag1 /ties=breslow;
run;
proc phreg data=intermediate covsandwich(aggregate);
ID caseidno;
model daystorept2*inrept2(0)=flag2 /ties=breslow;
run;
And was still unable to replicate the issue. Can you try running the second part of the real data and report back as to whether or not this remedies the issue? Note the 'intermediate' step -- I pull your cntc30chld logic out the the proc phreg procedure.
-unison
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.