Hello,
I'm using PROC GENMOD in version 9.4 to calculate adjusted prevalence ratios and 95% CIs. In my adjusted models, I'm getting different estimates for these values, which are sometimes in the opposite direction. I'm a bit confused about why the estimates are differing for some, but not all, variables, and which are the "true" estimates that I should be reporting. My code and brief explanation of the model are below. Thank you!
Code to estimate the prevalence ratio of my binary outcome of disease status (yes/no). My primary exposure of interest is sex (binary), and covariates are age (binary) and race/ethnicity (categorical with 5 levels).
proc genmod data=analysis;
class id age2(ref='1') sex (ref='M') race_ethnicity (ref='1');
model disease (event='1') = age2 sex race_ethnicity / dist=poisson link=log;
repeated subject=id /type=unstr;
estimate "PR for age2" age2 1 -1/exp;
estimate "PR for sex" sex 1 -1/exp;
estimate "PR for NHB vs NHW" race_ethnicity 1 -1 0 0 0/exp;
estimate "PR for H vs NHW" race_ethnicity 1 0 -1 0 0/exp;
estimate "PR for NHA vs NHW" race_ethnicity 1 0 0 -1 0/exp;
estimate "PR for NHO vs NHW" race_ethnicity 1 0 0 0 -1/exp;
run;
Here is the output. Here, the estimates for age2 and sex are the same for the top and bottom tables. But they start to differ for the race_ethnicity results, so much so that the estimates are in the opposite direction. For example, the estimate for race_ethnicity 4 in the top table is -0.0499 but 0.1609 in the bottom table.
Any documentation or other posts that you recommend would also be greatly appreciated! Thank you!
I believe the problem here is because you want to make comparisons with your reference level, 1, and the REF='1' option makes it the LAST level in the parameter estimates table, but your ESTIMATE statements contrast each level against the FIRST level (referring to your original post). Try moving all the "1" values from the first position to the last position. For example, the first ESTIMATE statement for race_ethnicity would become:
estimate "PR for NHB vs NHW" race_ethnicity -1 0 0 0 1/exp;
This should make all of the values in the L'Beta column just the negatives of the corresponding parameter estimates since you are then estimating reference-level_i differences rather than the other way around, which is what the parameters estimate.
The values for an effect in the ESTIMATE statement are applied to the parameter estimates in the order in which they appear in the parameter estimates table.
In the top table, you have race_ethnicity as numbers 1 through 5. It seems to me the names used in the text in the bottom table (example: "PR for H vs NHW") don't align with 1 through 5. Can you tell us the alignment? Because a different alignment could make the results in the bottom table align with the results in the top table.
Yes, sorry! race_ethnicity is a categorical variable with the following coding. NHW is the reference group.
1=NHW
2=NHB
3=H (noted as Hisp in the table from different coding by example)
4=NHA
5=NHO
Thank you!
In the top table, race_ethnicity = 2 shows an estimate of 0.2162 and this matches the bottom table estimate for NHO vs NHW.
So there is still some misalignment of numbers to labels, as far as I can tell. Perhaps having race_ethnicity (ref='1') affects this somehow. As a wild guess, re-run this code without race_ethnicity (ref='1') and see if the numbers match what you (and I) expect.
Great point about the race_ethnicity=2 estimate.
Do you mean remove race_ethnicity(ref='1') from the class statement completely? If I do that, does it treat it as a continuous variable? Alternatively, if I remove just the "(ref='1'), then it automatically uses race_ethnicity=5 as the reference group.
Yes remove the (ref='1') and see if the estimates are comparable.
Got it! When the only change that I make is remove the "(ref='1')", the estimates aren't comparable, EXCEPT For the NHO group (like before). So now I'm even more puzzled about how to troubleshoot!
The estimates are comparable.
NHB vs NHW is estimate 1 minus estimate 2
Hisp vs NHW is estimate 1 minus estimate 3
and so on
So if you go back to the original tables, I think you will find something similar.
Ok I think I'm starting to understand! So I should use the exponentiated RR (PR) estimates in the second/bottom table, correct? Do you know why the ref='1' statement seems to be messing it up?
As an aside, I think i found I had a mistake in the code, where I think the reference group (which I want to be NHW) should be the "-1" instead of "1".
proc genmod data=analysis;
class id age2(ref='1') sex (ref='M') race_ethnicity ;
model disease (event='1') = age2 sex race_ethnicity / dist=poisson link=log;
repeated subject=id /type=unstr;
estimate "PR for age2" age2 1 -1/exp;
estimate "PR for sex" sex 1 -1/exp;
estimate "PR for NHB vs NHW" race_ethnicity -1 1 0 0 0/exp;
estimate "PR for H vs NHW" race_ethnicity -1 0 1 0 0/exp;
estimate "PR for NHA vs NHW" race_ethnicity -1 0 0 1 0/exp;
estimate "PR for NHO vs NHW" race_ethnicity -1 0 0 0 1/exp;
run;
When I make that adjustment, the directionality of the estimates make sense
Thank you!!
I believe the problem here is because you want to make comparisons with your reference level, 1, and the REF='1' option makes it the LAST level in the parameter estimates table, but your ESTIMATE statements contrast each level against the FIRST level (referring to your original post). Try moving all the "1" values from the first position to the last position. For example, the first ESTIMATE statement for race_ethnicity would become:
estimate "PR for NHB vs NHW" race_ethnicity -1 0 0 0 1/exp;
This should make all of the values in the L'Beta column just the negatives of the corresponding parameter estimates since you are then estimating reference-level_i differences rather than the other way around, which is what the parameters estimate.
The values for an effect in the ESTIMATE statement are applied to the parameter estimates in the order in which they appear in the parameter estimates table.
That fixed it! Thank you!! I didn't realize that's what was occurring in the background when using ref='1'. Thank you for the explanation and your time!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.