Hi,
I ran a simple multinomial logit model using proc GLIMMIX. The model predicts occupation in 4 categories (high, medium, low, unemployed) with the normalized difference in supply and demand and a region fixed effect parameter. The result is quite strange as it produces estimates for the reference category for the region parameters. I also notice that the p-values with proc GLIMMIX look a bit odd, all of them for the category "HIGH" are higher than 0.99.
When I replicate the same model with proc LOGIT, everything looks fine, with consistent parameters and p-values. How can I fix this? I need to use proc GLIMMIX since I will eventually add some random effects.
proc glimmix data=lab.merge INITGLM;
class occ_reduced(ref='LOW') region(ref='SK') ;
model occ_reduced = diffH diffM diffL region / solution DIST=MULTINOMIAL link=glogit;
where labour=1;
weight weight;
run;
proc logistic data=lab.merge;
class region / region(ref='SK') / param = ref;
model occ_reduced(ref='LOW') = diffH diffM diffL region / link = glogit;
weight weight /norm;
where labour=1;
run;
See this note. The standard errors differ markedly and that is probably because you used the NORM option in the WEIGHT statement in PROC LOGISTIC to normalize the weight variable values so that they sum to the sample size. You would need to create a new weight variable containing normalized weights for use in GLIMMIX. As mentioned in the note, if these weights are survey weights, then using the WEIGHT statement doers not yield a proper analysis of survey data. For that you should use PROC SURVEYLOGISTIC.
Thanks for your answer. I tried without any weight statement for both models, but the result is the same, with an unexpected estimate for the reference category using proc glimmix and weird p-values in other covariates. Strangely, when I change the reference category, the problem disappears.
The GLIMMIX documentation goes through several caveats regarding the ordering of the categories for a nominal multinomial analysis. In the end, I think their approach would look like this when translated to your situation:
proc glimmix data=lab.merge INITGLM;
class occ_reduced region(ref='SK') ;
model occ_reduced(order= freq ref='LOW') = diffH diffM diffL region / solution DIST=MULTINOMIAL link=glogit;
where labour=1;
weight weight;
run;
Quoting the documentation: "
In generalized logit models (for multinomial data with unordered categories), one response category is chosen as the reference category in the formulation of the generalized logits. By default, the linear predictor in the reference category is set to 0, and the reference category corresponds to the entry in the "Response Profile" table with the highest Ordered Value. You can affect the assignment of Ordered Values with the DESCENDING and ORDER= options in the MODEL statement. You can choose a different reference category with the REF= option. The choice of the reference category for generalized logit models affects the results. It is sometimes recommended that you choose the category with the highest frequency as the reference (see, for example, Brown and Prescott 1999, p. 160). You can achieve this with the GLIMMIX procedure by combining the ORDER= and REF= options..."
I don't know if that will help or not, but my experience with generalized logit models is that critical clues to its function in GLIMMIX aren't stressed enough in the examples.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.