Re: Multi-level Modeling with an Ordinal Dependent Variable

KMun · Posted 04-28-2020 05:35 PM

Deleted from wrong section and moved. Apologies.

Good afternoon. I am running two sets of analyses, one at the individual-level only, and the second with multi-level models. My dependent variable is a five-item Likert-scale. I am struggling with coding (and analytic strategy).

The book and notes I typically consult for this are locked in my work office, which we are no longer allowed into. Do the standard MLM procedures (mixed, GLM, Glimmix, genmod) default to linear models, which are not appropriate for ordinal dependent variables? If so, what is the best approach for handling the ordinal variable?

I am reading and have citations for running linear models with an ordinal variable in multi-level models, as multi-level fixed effects versions of ordinal logistic regression models are "notoriously difficult to interpret" and lead to biased estimates. If I were just running the multi-level models, I could get away with this. However, I still need to run similar individual-level models so that I can compare results; I don't think this would justify running OLS with an individual level variable.

To complicate things, the proportional odds assumption is violated. Any help is appreciated, as I have committed these things to memory, apparently poorly, and am a bit stuck without my notes.

SteveDenham · Posted 04-29-2020 07:55 AM

Well, GLIMMIX can handle a multinomial distribution, with a choice of a cumulative logit or a general logit link.

So, you can change multilevel to individual level in GLIMMIX by removing the RANDOM statement.

I am curious about your statement about the proportional odds, though. Could you give more info on that?

SteveDenham

KMun · Posted 04-29-2020 02:03 PM

Thanks.

When I did a standard ordered logistic regression model

proc logistic descending;

model var1 = var2 var3 var4;

run;

The proportional odds assumption in the model was violated. Suggesting that strongly agree to agree, agree to neutral, neutral to disagree, and disagree to strongly disagree are not properly ordered (I know this is phrased awkwardly) and this may not be the appropriate model after all. I tried collapsing the variable into three response values (agree, neutral, disagree) and got the same result.

So, now I am left with two problems.

1. How do I estimate a multi-level model for an ordinal dependent variable?

2. Should I even be running ordered logistic regression models?

SteveDenham · Posted 04-29-2020 03:04 PM

OK, that makes sense. So in PROC GLIMMIX (for multilevel modeling) you would specify a generalized logit link, rather than a cumulative logit. What the test of proportional odds implies is that the "distance" between the levels isn't constant, and as a result your ordinal assumption may not be justified. Rather than strongly agree, agree, neutral, disagree and strongly disagree, you may as well have response variables called car, house, tv, sofa, and bed. But that should not completely hamstring your analysis - it is just going to be less powerful. In the PROC GLIMMIX, look at Example 45.11 Maximum Likelihood in Proportional Odds Model with Random Effects. In that example, a cumulative probit link is specified (which accommodates the proportional odds assumption). Try running that example with link=glogit, and adding group=Shape to the RANDOM statement options. Here is what I ran;

proc glimmix data=footshape method=quad;
   class sire shape(ref=first);
   model Shape = yr b1 b2 b3 / s link=glogit dist=multinomial;
   random int / sub=sire s cl group=shape;
   ods output Solutionr=solr;
   freq count;
run;

This was remarkably rapidly run. Output gives Type 3 tests for the fixed effects, and solution vectors for both the fixed and random effects. The fixed effect solutions (intercepts and slopes) consider the first shape category as the reference, so that odds ratios could be calculated based on the covariate values.

I hope this approach could work for your question.

SteveDenham

KMun · Posted 04-29-2020 03:49 PM

Thank you again. I am having some difficulty applying this model to my study, but getting there.

It looks like this is modeling two class variables? I am only nesting data in geographic regions. So I presume class region (ref=first); would be sufficient. Region as the subject instead of sire makes sense. I am getting stuck at the group statement.

If it helps, this is what I was running prior:

proc mixed method = ml covtest ic;
class region;
model gvrfgap = var1 var2 var3 var4 var5 var1_region_aggregate/solution ddfm=bw notest; weight newweight;
random int var2 var3/subject = region G TYPE = VC;
run;

Also, it is good that the Glimmix procedure works around the proportional odds assumption issue. Do you have any thoughts on what I should run as the non-HLM model to compare the HLM results to, with the proportional odds assumption being violated at the individual level?

KMun · Posted 04-29-2020 04:45 PM

Wait, nevermind. I realized shape is dependent variable and belongs in the group statement. A bad oversight on my part. Apologies. I am still having difficulties, if you have time.

proc glimmix data=work.essregion method=quad;
class region gvrfgap(ref=first);
model gvrfgap = var1 var2 var3 var4 var5 region_var5/ s link=glogit dist=multinomial;
random int var1 var2/ sub=region s cl group=gvrfgap;
ods output Solutionr=solr;
freq count;
run;

Error message

ERROR: Invalid or missing data.
WARNING: Output 'Solutionr' was not created. Make sure that the output object name, label, or
path is spelled correctly. Also, verify that the appropriate procedure options are
used to produce the requested output object. For example, verify that the NOPRINT
option is not used.

SteveDenham · Posted 04-30-2020 07:39 AM

Only guessing, as I don't know what your data set looks like. The two most likely candidates are count (which may not be in your dataset) and region_var5 (which may not be a continuous variable). Could your provide the first 10 rows of your data?

SteveDenham

KMun · Posted 04-30-2020 04:46 PM

Thank you again for your time. It is greatly appreciated. I am having trouble integrating the data into this form, but I will keep trying.

Count is not in my dataset. region_var5 is aggregate levels of religious service attendance by region. It was created by nesting a survey response (how often do you attend religious services: 0 = never to 7 = every day) in NUTS regions. Range = 1.43-4.76.

SteveDenham · Posted 05-01-2020 07:44 AM

If count is not in your dataset, then adding the FREQ count statement will lead to strange behavior. Try running without the FREQ statement.

SteveDenham

Catch up on SAS Innovate 2026