I'm trying to run a log binomial regression on some cross-sectional data. I've been having some problems with Proc Genmod; it can do so many things and I'm not sure how to get it to do what I need it to. I've been using this guide but I'm still lost.
Right now the Estimate statement is giving me errors that I'm not sure how to fix. It's been telling me that the categorical variable in that statement (hosp) first needs to appear in the model statement, which it does. Can anyone clarify what’s going wrong here? Thank you!
Proc GenMod data=GC descending;
Class hosp /param=ref ref=first;
Model hosp = Idthinklessn Hithinklessn Iddonttalkn
HIdonttalkn connectn ratherknown normaln explainn / Dist=bin Link=log;
Estimate 'minority stress and disease severity' hosp 1/exp;
Run;
Other info:
All minority stress variables (Idthinklessn Hithinklessn Iddonttalkn HIdonttalkn connectn ratherknown normaln explainn) use a numerical scale. Hosp is the only categorical.
Error message when run as-is:
For binary or categorical response models fit by procedures like LOGISTIC, GENMOD, GEE, GLIMMIX, GAMPL, and so on, the response variable never needs to appear in the CLASS statement and is best not entered there. As I mentioned, for a continuous predictor, you need to use the ESTIMATE statement. Again, the note I originally referred you to shows an example of estimating the relative risk for a 1 unit increase in the continuous variable, X. In that example, X interacts with A, but if it weren't involved in any interaction, then the relative risk estimate would be obtained with:
estimate "RR (X+1)/X" x 1;
See this note on estimating the relative risk, particularly the section on the log binomial model. As shown there, the variable specified in the ESTIMATE statement should be the predictor that you want to compute the relative risk for, not the response variable. If your predictors are binary, then it is easier to specify them in the CLASS statement which will then let you use the LSMEANS statement instead of the more difficult ESTIMATE statement. Or, you might want to avoid the problems with the log binomial model and the need to create the proper ESTIMATE statement by using the NLMeans macro approach that is shown in the earlier part of the note.
You just need to use the LSMEANS statement with the DIFF and EXP options as shown in the note I referred to. To use the LSMEANS statement, you need to specify any predictors that you want relative risk estimates for in the CLASS statement. In the CLASS statement, you need to specify PARAM=GLM, not PARAM=REF. You do not need the ESTIMATE statement unless you want a relative risk estimate for a 1- (or other sized) unit increase in a continuous (non-CLASS) predictor. The example in the note uses the LSMEANS statement for the CLASS predictor, A, and the ESTIMATE statement for the continuous predictor, X. You say you have predictors using a 5-point scale, so for each one you need to have a single variable with 5 possible values and you need to include it in the CLASS statement.
The following example uses the neuralgia data in the example titled "Logistic Modeling with Categorical Predictors" in the LOGISTIC documentation. The RANK procedure is used just to change the two continuous predictors into 3-level categorical predictors. The Exponentiated columns in the LSMEANS tables show the estimated probabilities at each level of each predictor. The Exponentiated columns in the Differences of LSMEANS tables show the relative risk estimates comparing each pair of levels as indicated in the very first two columns in each table.
proc rank data=neuralgia out=tmp groups=3;
var duration age;
run;
proc genmod;
class Treatment Sex Age Duration / param=glm;
model Pain = Treatment Sex Age Duration / dist=bin link=log;
lsmeans Treatment Sex Age Duration / diff exp cl;
run;
For binary or categorical response models fit by procedures like LOGISTIC, GENMOD, GEE, GLIMMIX, GAMPL, and so on, the response variable never needs to appear in the CLASS statement and is best not entered there. As I mentioned, for a continuous predictor, you need to use the ESTIMATE statement. Again, the note I originally referred you to shows an example of estimating the relative risk for a 1 unit increase in the continuous variable, X. In that example, X interacts with A, but if it weren't involved in any interaction, then the relative risk estimate would be obtained with:
estimate "RR (X+1)/X" x 1;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.