Hi all,
While knowing how to estimate hazard ratio for normal case, there has been a question stuck in my mind:
how to estimate hazard ratio for different groups?
This may be too ambiguous, so allows me to explain.
# Scenario 1 -- simply estimation of hazard ratio(without adjustment for any confounder)
This is straighforward, by using the following code we can estimate hazard ratio for independent variable: MorningBPSurge
PROC PHREG DATA=one;
MODEL followtime*allcause(0)=MorningBPSurge / RISKLIMITS;
RUN;
# Scenario 2 -- multivariable hazard ratio (with adjustment for confounding effect)
Also, this is easy to accomplish and easy to understand: we just need to put the confounders we want to adjust into the model.
For example, estimating hazard ratio for independent variable (MorningBPSurge) while adjusting for two confounders (age and sex):
PROC PHREG DATA=one;
MODEL followtime*allcause(0)=MorningBPSurge Age Sex / RISKLIMITS;
RUN;
However, the next case really confuses me
# Scenario 3 -- haza ratio for different groups while adjusting for confounders
Let's say we what to know if the hazard ratio for independent variable varies between two groups, and also, we need to take into account confounding effect.
For example, we want to know if the hazard ratio of MorningBPSurge differs for group A and group B.
Thus, we have
A) Independet variable: MorningBPSurge
B) Group variable: H_BP_Night, 0 if this patient has no hypertension during nighttime; 1 if he/she has.
C) Confounders: Age, Sex
What I usually do is that I use statment "by" to get HR within each group:
PROC PHREG DATA=one;
BY H_BP_Night;
MODEL followtime*allcause(0)=MorningBPSurge Age Sex / RISKLIMITS;
RUN;
Then, regarding to P-value, I create interaction term "MorningBPSurge * H_BP_Night" to see if it is significant:
PROC PHREG DATA=one;
MODEL followtime*allcause(0)=MorningBPSurge Age Sex
H_BP_Night MorningBPSurge*H_BP_Night/ RISKLIMITS;
RUN;
As you probably understand the situation, my questions are
1) Is the using of "by" and interaction term correct for what I want?
2) Do I really need to do this separately, or is there any exsited statment can achieve this?
Very welcom if you guys have any idea about this.
Thanks in advance!
I will also encourage you to use the "hazardratio" statement. It can easily calculate the estimated hazardratio for each level of some effectmodificator (effectmodification = interaction, just an other word for same thing).
A little problem I have with this is that it doesnt always select the reference level correctly when using the glm-parametrizaion. If the default parametrization is used, then both maineffects and interaction effects should be in the modelstatement.
Here a simple example.
data mydata;
do group=1,2;
do exposure='yes','no';
do i=1 to 10000;
rate=0.1*(3**(group=2))*(1.5**((exposure='yes')*(group=1))) * 2**((exposure='yes')*(group=2)) ;
t=rand('exponential',1/rate);
output;
end;
end;
end;
run;
*estimate the exposure effect for each of the two groups:;
proc phreg data=mydata;
class group exposure(ref='no');
model t=exposure*group exposure group;
hazardratio exposure/at(group=all) dif=ref;
run;
*Same again, but with glm-parametrization - then it is enough only to specifiy interaction as that include here maineffects;
proc phreg data=mydata;
class group exposure(ref='no')/param=glm;
model t=exposure*group;
hazardratio exposure/at(group=all) dif=ref;
run;
If you use the "by" statement, then you allow the model to have a different baseline hazard function for each value of the by-variable. So it will not give exactly same estimate. That will also give a little loss in statistical power.
Good luck;-)
Have you looked at using the HAZARDRATIO statement specifically?
It allows you to specify exactly what you're looking for as far as I see.
Docs:
A whitepaper on several options:
https://support.sas.com/resources/papers/proceedings10/253-2010.pdf
Using a BY to get an estimate for each is incorrect - that runs an individual model for each level of your BY variable. It doesn't sound like that's what you're actually interested in or at least in my experience that wouldn't be correct.
I will also encourage you to use the "hazardratio" statement. It can easily calculate the estimated hazardratio for each level of some effectmodificator (effectmodification = interaction, just an other word for same thing).
A little problem I have with this is that it doesnt always select the reference level correctly when using the glm-parametrizaion. If the default parametrization is used, then both maineffects and interaction effects should be in the modelstatement.
Here a simple example.
data mydata;
do group=1,2;
do exposure='yes','no';
do i=1 to 10000;
rate=0.1*(3**(group=2))*(1.5**((exposure='yes')*(group=1))) * 2**((exposure='yes')*(group=2)) ;
t=rand('exponential',1/rate);
output;
end;
end;
end;
run;
*estimate the exposure effect for each of the two groups:;
proc phreg data=mydata;
class group exposure(ref='no');
model t=exposure*group exposure group;
hazardratio exposure/at(group=all) dif=ref;
run;
*Same again, but with glm-parametrization - then it is enough only to specifiy interaction as that include here maineffects;
proc phreg data=mydata;
class group exposure(ref='no')/param=glm;
model t=exposure*group;
hazardratio exposure/at(group=all) dif=ref;
run;
If you use the "by" statement, then you allow the model to have a different baseline hazard function for each value of the by-variable. So it will not give exactly same estimate. That will also give a little loss in statistical power.
Good luck;-)
Jacob,
Thank you for providing not only concept but also SAS code!
After checking reference that Reeza gave me, I've created SAS code nearly the same as you gave to me.
Hazardratio is a so powerful statement that make users more convenient to get the estimation.
Thank you again for introducing this poweful tool to me!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.