Solved: Re: PROC GENMOD conflicting estimates

telc24 · Posted 11-25-2024 01:54 PM

Hello,

I'm using PROC GENMOD in version 9.4 to calculate adjusted prevalence ratios and 95% CIs. In my adjusted models, I'm getting different estimates for these values, which are sometimes in the opposite direction. I'm a bit confused about why the estimates are differing for some, but not all, variables, and which are the "true" estimates that I should be reporting. My code and brief explanation of the model are below. Thank you!

Code to estimate the prevalence ratio of my binary outcome of disease status (yes/no). My primary exposure of interest is sex (binary), and covariates are age (binary) and race/ethnicity (categorical with 5 levels).

proc genmod data=analysis;
class id age2(ref='1') sex (ref='M') race_ethnicity (ref='1');
model disease (event='1') = age2 sex race_ethnicity / dist=poisson link=log;
repeated subject=id /type=unstr;
estimate "PR for age2" age2 1 -1/exp;
estimate "PR for sex" sex 1 -1/exp;
estimate "PR for NHB vs NHW" race_ethnicity 1 -1 0 0 0/exp;
estimate "PR for H vs NHW" race_ethnicity 1 0 -1 0 0/exp;
estimate "PR for NHA vs NHW" race_ethnicity 1 0 0 -1 0/exp;
estimate "PR for NHO vs NHW" race_ethnicity 1 0 0 0 -1/exp;
run;

Here is the output. Here, the estimates for age2 and sex are the same for the top and bottom tables. But they start to differ for the race_ethnicity results, so much so that the estimates are in the opposite direction. For example, the estimate for race_ethnicity 4 in the top table is -0.0499 but 0.1609 in the bottom table.

Any documentation or other posts that you recommend would also be greatly appreciated! Thank you!

StatDave · Posted 11-25-2024 05:15 PM

I believe the problem here is because you want to make comparisons with your reference level, 1, and the REF='1' option makes it the LAST level in the parameter estimates table, but your ESTIMATE statements contrast each level against the FIRST level (referring to your original post). Try moving all the "1" values from the first position to the last position. For example, the first ESTIMATE statement for race_ethnicity would become:

estimate "PR for NHB vs NHW" race_ethnicity -1 0 0 0 1/exp;

This should make all of the values in the L'Beta column just the negatives of the corresponding parameter estimates since you are then estimating reference-level_i differences rather than the other way around, which is what the parameters estimate.

The values for an effect in the ESTIMATE statement are applied to the parameter estimates in the order in which they appear in the parameter estimates table.

View solution in original post

PaigeMiller · Posted 11-25-2024 02:18 PM

In the top table, you have race_ethnicity as numbers 1 through 5. It seems to me the names used in the text in the bottom table (example: "PR for H vs NHW") don't align with 1 through 5. Can you tell us the alignment? Because a different alignment could make the results in the bottom table align with the results in the top table.

--
Paige Miller

telc24 · Posted 11-25-2024 02:22 PM

Yes, sorry! race_ethnicity is a categorical variable with the following coding. NHW is the reference group.

1=NHW

2=NHB

3=H (noted as Hisp in the table from different coding by example)

4=NHA

5=NHO

Thank you!

PaigeMiller · Posted 11-25-2024 02:54 PM

In the top table, race_ethnicity = 2 shows an estimate of 0.2162 and this matches the bottom table estimate for NHO vs NHW.

So there is still some misalignment of numbers to labels, as far as I can tell. Perhaps having race_ethnicity (ref='1') affects this somehow. As a wild guess, re-run this code without race_ethnicity (ref='1') and see if the numbers match what you (and I) expect.

--
Paige Miller

telc24 · Posted 11-25-2024 03:16 PM

Great point about the race_ethnicity=2 estimate.

Do you mean remove race_ethnicity(ref='1') from the class statement completely? If I do that, does it treat it as a continuous variable? Alternatively, if I remove just the "(ref='1'), then it automatically uses race_ethnicity=5 as the reference group.

PaigeMiller · Posted 11-25-2024 03:17 PM

Yes remove the (ref='1') and see if the estimates are comparable.

--
Paige Miller

telc24 · Posted 11-25-2024 03:24 PM

Got it! When the only change that I make is remove the "(ref='1')", the estimates aren't comparable, EXCEPT For the NHO group (like before). So now I'm even more puzzled about how to troubleshoot!

PaigeMiller · Posted 11-25-2024 03:34 PM

The estimates are comparable.

NHB vs NHW is estimate 1 minus estimate 2

Hisp vs NHW is estimate 1 minus estimate 3

and so on

So if you go back to the original tables, I think you will find something similar.

--
Paige Miller

telc24 · Posted 11-25-2024 03:55 PM

Ok I think I'm starting to understand! So I should use the exponentiated RR (PR) estimates in the second/bottom table, correct? Do you know why the ref='1' statement seems to be messing it up?

As an aside, I think i found I had a mistake in the code, where I think the reference group (which I want to be NHW) should be the "-1" instead of "1".

proc genmod data=analysis;
class id age2(ref='1') sex (ref='M') race_ethnicity ;
model disease (event='1') = age2 sex race_ethnicity / dist=poisson link=log;
repeated subject=id /type=unstr;
estimate "PR for age2" age2 1 -1/exp;
estimate "PR for sex" sex 1 -1/exp;
estimate "PR for NHB vs NHW" race_ethnicity -1 1 0 0 0/exp;
estimate "PR for H vs NHW" race_ethnicity -1 0 1 0 0/exp;
estimate "PR for NHA vs NHW" race_ethnicity -1 0 0 1 0/exp;
estimate "PR for NHO vs NHW" race_ethnicity -1 0 0 0 1/exp;
run;

When I make that adjustment, the directionality of the estimates make sense

Thank you!!

PaigeMiller · Posted 11-25-2024 03:58 PM

Do you know why the ref='1' statement seems to be messing it up?

I don't know. I suspect @StatDave or @jiltao might have an explanation.

--
Paige Miller

StatDave · Posted 11-25-2024 05:15 PM

I believe the problem here is because you want to make comparisons with your reference level, 1, and the REF='1' option makes it the LAST level in the parameter estimates table, but your ESTIMATE statements contrast each level against the FIRST level (referring to your original post). Try moving all the "1" values from the first position to the last position. For example, the first ESTIMATE statement for race_ethnicity would become:

estimate "PR for NHB vs NHW" race_ethnicity -1 0 0 0 1/exp;

This should make all of the values in the L'Beta column just the negatives of the corresponding parameter estimates since you are then estimating reference-level_i differences rather than the other way around, which is what the parameters estimate.

The values for an effect in the ESTIMATE statement are applied to the parameter estimates in the order in which they appear in the parameter estimates table.

telc24 · Posted 11-25-2024 05:55 PM

That fixed it! Thank you!! I didn't realize that's what was occurring in the background when using ref='1'. Thank you for the explanation and your time!

Catch up on SAS Innovate 2026