Solved: Re: Log Binomial Regression - problem with Estimate in Proc Genmod

akimme · Posted 05-30-2023 10:41 AM

I'm trying to run a log binomial regression on some cross-sectional data. I've been having some problems with Proc Genmod; it can do so many things and I'm not sure how to get it to do what I need it to. I've been using this guide but I'm still lost.

Right now the Estimate statement is giving me errors that I'm not sure how to fix. It's been telling me that the categorical variable in that statement (hosp) first needs to appear in the model statement, which it does. Can anyone clarify what’s going wrong here? Thank you!

Proc GenMod data=GC descending;
Class hosp  /param=ref ref=first;
Model hosp  =     Idthinklessn    Hithinklessn    Iddonttalkn      
    HIdonttalkn   connectn    ratherknown     normaln     explainn / Dist=bin Link=log;
Estimate 'minority stress and disease severity' hosp 1/exp;
Run;

Other info:

All minority stress variables (Idthinklessn Hithinklessn Iddonttalkn HIdonttalkn connectn ratherknown normaln explainn) use a numerical scale. Hosp is the only categorical.

Error message when run as-is:

ERROR: Effects used in the ESTIMATE statement must have appeared previously in the MODEL statement.

Other tinkering to Estimate produced:

ERROR 22-322: Syntax error, expecting one of the following: a numeric constant, a datetime constant, (, *, +, -.

ERROR 202-322: The option or parameter is not recognized and will be ignored.

I think I need something other than 1/exp but what and why?

StatDave · Posted 06-01-2023 04:26 PM

For binary or categorical response models fit by procedures like LOGISTIC, GENMOD, GEE, GLIMMIX, GAMPL, and so on, the response variable never needs to appear in the CLASS statement and is best not entered there. As I mentioned, for a continuous predictor, you need to use the ESTIMATE statement. Again, the note I originally referred you to shows an example of estimating the relative risk for a 1 unit increase in the continuous variable, X. In that example, X interacts with A, but if it weren't involved in any interaction, then the relative risk estimate would be obtained with:

estimate "RR (X+1)/X" x 1;

View solution in original post

StatDave · Posted 05-30-2023 11:18 AM

See this note on estimating the relative risk, particularly the section on the log binomial model. As shown there, the variable specified in the ESTIMATE statement should be the predictor that you want to compute the relative risk for, not the response variable. If your predictors are binary, then it is easier to specify them in the CLASS statement which will then let you use the LSMEANS statement instead of the more difficult ESTIMATE statement. Or, you might want to avoid the problems with the log binomial model and the need to create the proper ESTIMATE statement by using the NLMeans macro approach that is shown in the earlier part of the note.

akimme · Posted 05-31-2023 06:52 AM

Unfortunately, all but one predictor is on a five point scale so it looks like I'm stuck with ESTIMATE. Can I use ESTIMATE for all of them or should I use it and LSMEANS together?

Could you show me an example of what it would look like to use ESTIMATE for multiple predictors? I've only found examples with a single variable to be estimated.

My advisor has recommended log binomial due to the study design (cross sectional) so I'm reluctant to change it. It seems to be a less common method though since I'm not finding much on it. Are there any other good overviews that you know of that could explain the parts of the code in detail?

StatDave · Posted 05-31-2023 12:01 PM

You just need to use the LSMEANS statement with the DIFF and EXP options as shown in the note I referred to. To use the LSMEANS statement, you need to specify any predictors that you want relative risk estimates for in the CLASS statement. In the CLASS statement, you need to specify PARAM=GLM, not PARAM=REF. You do not need the ESTIMATE statement unless you want a relative risk estimate for a 1- (or other sized) unit increase in a continuous (non-CLASS) predictor. The example in the note uses the LSMEANS statement for the CLASS predictor, A, and the ESTIMATE statement for the continuous predictor, X. You say you have predictors using a 5-point scale, so for each one you need to have a single variable with 5 possible values and you need to include it in the CLASS statement.

The following example uses the neuralgia data in the example titled "Logistic Modeling with Categorical Predictors" in the LOGISTIC documentation. The RANK procedure is used just to change the two continuous predictors into 3-level categorical predictors. The Exponentiated columns in the LSMEANS tables show the estimated probabilities at each level of each predictor. The Exponentiated columns in the Differences of LSMEANS tables show the relative risk estimates comparing each pair of levels as indicated in the very first two columns in each table.

proc rank data=neuralgia out=tmp groups=3;
var duration age;
run;
proc genmod;
class Treatment Sex Age Duration / param=glm;
model Pain = Treatment Sex Age Duration / dist=bin link=log;
lsmeans Treatment Sex Age Duration / diff exp cl;
run;

akimme · Posted 06-01-2023 03:40 PM

Ah, okay, thank you so much for spelling that out for me! One of the issues I had was that I put the outcome in the CLASS statement because it was categorical too.

I haven't added in non-minority stress predictors (eg age), but some of those are continuous. I could use RANK on them too, I suppose. Otherwise, do you have any examples that use more continuous variables?

StatDave · Posted 06-01-2023 04:26 PM

For binary or categorical response models fit by procedures like LOGISTIC, GENMOD, GEE, GLIMMIX, GAMPL, and so on, the response variable never needs to appear in the CLASS statement and is best not entered there. As I mentioned, for a continuous predictor, you need to use the ESTIMATE statement. Again, the note I originally referred you to shows an example of estimating the relative risk for a 1 unit increase in the continuous variable, X. In that example, X interacts with A, but if it weren't involved in any interaction, then the relative risk estimate would be obtained with:

estimate "RR (X+1)/X" x 1;

Log Binomial Regression - problem with Estimate in Proc Genmod

Re: Log Binomial Regression - problem with Estimate in Proc Genmod

Re: Log Binomial Regression - problem with Estimate in Proc Genmod

Re: Log Binomial Regression - problem with Estimate in Proc Genmod

Re: Log Binomial Regression - problem with Estimate in Proc Genmod

Re: Log Binomial Regression - problem with Estimate in Proc Genmod

Re: Log Binomial Regression - problem with Estimate in Proc Genmod

SAS Innovate 2025: Save the Date