Solved: Re: How to get categorical variables RR estimates in PROC GENMOD

N0o9r5a · Posted 08-10-2021 06:51 PM

Hello everyone, I have a question about how to get categorical variables RR estimates in PROC GENMOD.

My outcome is a binomial variable (death), which 1= event happened, 0=event didn't happen. I listed 3 variables (x, y, z) as exposures that I am looking for to get their RR estimate in the PROC GENMOD procedure. All 3 variables are categorical, x has 5 categories, y has 2 categories, z has 5 categories. There is no missing value among those 4 variables.

I found a PROC GENMOD tutorial from this link: https://stats.idre.ucla.edu/sas/faq/how-can-i-estimate-relative-risk-in-sas-using-proc-genmod-for-co...

The code I used is:

proc genmod data = test;

class x (ref="1")/param=ref;
class y (ref="1")/param=ref;
class z (ref="1")/param=ref;

model death (event="1") = x y z/ dist=binomial link=log;

estimate 'Beta_x' x 1 -1/exp;
estimate 'Beta_y' y 1 -1/exp;
estimate 'Beta_z' z 1 -1/exp;

run;

With the above code, I have this output.

Although you can see the estimates don't show for different categories of x and z.

I searched for other solutions. Then I found this solution: https://communities.sas.com/t5/Statistical-Procedures/RR-estimates-using-proc-Genmod-for-categorical...

link to this website: https://support.sas.com/kb/23/003.html

So I tried to use LSMEANS, the below code is I modified to get categorical variables RR estimates.

proc genmod data = test;

class x (ref="1")/param=ref;
class y (ref="1")/param=ref;
class z (ref="1")/param=ref;

model death (event="1") = x y z/ dist=binomial link=log;

lsmeans  x z / diff exp cl; /*I only added this line comparing to the code above*/

estimate 'Beta_x' x 1 -1/exp;
estimate 'Beta_y' y 1 -1/exp;
estimate 'Beta_z' z 1 -1/exp;

run;

Although nothing new results showed up with this code, SAS generated the same results as the first screenshot I attached. Also, with the following Note and Warning:

I am not sure if I used LSMEANS in the right way.

Also, I wanted to ask what "1 -1" stands for in "estimate 'Beta_x 1 - 1/ exp"? I have seen some tutorial shows "1 -1" or "1" or "0 1 -1", but I don't really understand the function of it. If there are some tutorials I should know before doing this code, I would love to know about them!

I appreciate your help!

StatDave · Posted 08-10-2021 11:48 PM

The coefficients in the ESTIMATE (or CONTRAST) statement create a vector of values, appropriate for the quantity that you want to estimate or test, that multiplies the vector of model parameter estimates. As such, there should be exactly as many coefficients for an effect in the model (like your X, Y, or Z) as there are parameter estimates in the parameter estimates table. As you can see, there are 4, 1, and 4 parameter estimates for X, Y, and Z respectively, so your two coefficients in each case are not correct for any of them. If you really want to understand this, then see this note. However, as stated there, the ESTIMATE and CONTRAST statements should be avoided when simpler statements can be used that don't require you to properly determine coefficients - the LSMEANS statement being one such. In your case, I strongly advise that you to avoid the log-binomial model that you are attempting. Instead, fit a simple logistic model using PROC LOGISTIC and then use the NLMeans macro (as shown in Note 23003 that you referred to) to estimate and test the relative risks. You will need to download updated versions of the NLMeans and NLEST macros in order to use the NULL= option if desired. If you are determined to use the log-binomial model then, as also shown in Note 23003, you need to remove the ESTIMATE statements and specify PARAM=GLM, not PARAM=REF (or just omit the PARAM= option), in the CLASS statement(s) as suggested by the Warnings in order to get the relative risk estimates from the LSMEANS statement.

View solution in original post

StatDave · Posted 08-10-2021 11:48 PM

The coefficients in the ESTIMATE (or CONTRAST) statement create a vector of values, appropriate for the quantity that you want to estimate or test, that multiplies the vector of model parameter estimates. As such, there should be exactly as many coefficients for an effect in the model (like your X, Y, or Z) as there are parameter estimates in the parameter estimates table. As you can see, there are 4, 1, and 4 parameter estimates for X, Y, and Z respectively, so your two coefficients in each case are not correct for any of them. If you really want to understand this, then see this note. However, as stated there, the ESTIMATE and CONTRAST statements should be avoided when simpler statements can be used that don't require you to properly determine coefficients - the LSMEANS statement being one such. In your case, I strongly advise that you to avoid the log-binomial model that you are attempting. Instead, fit a simple logistic model using PROC LOGISTIC and then use the NLMeans macro (as shown in Note 23003 that you referred to) to estimate and test the relative risks. You will need to download updated versions of the NLMeans and NLEST macros in order to use the NULL= option if desired. If you are determined to use the log-binomial model then, as also shown in Note 23003, you need to remove the ESTIMATE statements and specify PARAM=GLM, not PARAM=REF (or just omit the PARAM= option), in the CLASS statement(s) as suggested by the Warnings in order to get the relative risk estimates from the LSMEANS statement.

N0o9r5a · Posted 08-16-2021 05:25 PM

Hello,

Thank you for the solution! It was a huge help! I haven't learned how to use the macro, so I guess sticking with the proc genmod is more straightforward to me.

Although I have another weird situation. The code works fine after I did the changes you suggested, but once I add more than 6 or 7 variables in the same model, an error shows "The mean parameter is either invalid or at a limit of its range for some observations" and couldn't show the correct results. At first, I thought because of the limitation of certain categories, but the same error shows even there are more than 2000 samples in that category. Do you happen to know why would this happen?

I appreciate your help!

StatDave · Posted 08-16-2021 05:43 PM

That error is very common with the log binomial model and occurs because the log link doesn't insure that predicted values are between 0 and 1 as required by the binomial distribution That is why I suggest avoiding this model. This error is discussed in some detail in the note on relative risks that you referred to earlier. See the section on the log binomial model.