BookmarkSubscribeRSS Feed
keckk
Fluorite | Level 6
Hi statisticians and glimmix users!

This is the first time I used a glimmix model with the dist=multinomial option to fit an ordinal response (0,1,2).:
proc glimmix data=x METHOD=LAPLACE;
class farm genotype animal;
model score = genotype sweek genotype*sweek /
dist=multinomial link=cumlogit s oddsratio (diff=all);
random int / subject=farm s;
random int sweek / subject=animal s;
run;

I don't have experiences with multinomial logistic regression, and the few examples I studied did not address the issue of model fit and violation of model assumptions. How can I find out if the cumulative logit link has done its job properly ?

...a non-statistician user is asking.
Thanks for any hint or help.
7 REPLIES 7
Dale
Pyrite | Level 9
I'm afraid that the GLIMMIX procedure does not have much facility for testing the proportional odds assumption, if that is what you are after. About the best that you could do would be to construct all binary splits on your response variable and fit the GLIMMIX code which you have to each candidate binary split. You could then construct graphs in which a predictor variable is plotted on the X axis and the Y axis is the estimated logit. For each binary split, you would plot the resulting logits for each level of a categorical predictor variable or for two values of a continuous predictor variable. (For a continuous predictor variable, these can be any two values that you choose so long as you employ the same two values for every binary split.) Further detail about construction of these graphics is given below.

In your case, suppose that score has four ordered levels with values 1, 2, 3, and 4. You would create three binary splits of the data: 1 vs >1, 2 vs >2, and 3 vs >3. Construct three new variables:

  score1 = (score>1);
  score2 = (score>2);
  score3 = (score>3);

and fit your glimmix code to each of these three new response variables. I would add the LSMEANS statements

  lsmeans genotype / at sweek=;
  lsmeans genotype / at sweek=;


I presume that genotype has three levels. Let's suppose that those three levels are AA, AT, and TT. From these LSMEANS statements, you can construct a data set as shown below:

     Response            LSM stmt            Genotype            LSM value

       score1                      1                    AA                    
       score1                      1                    AT                    
       score1                      1                    TT                    
       score2                      1                    AA                    
       score2                      1                    AT                    
       score2                      1                    TT                    
       score3                      1                    AA                    
       score3                      1                    AT                    
       score3                      1                    TT                    

       score1                      2                    AA                    
       score1                      2                    AT                    
       score1                      2                    TT                    
       score2                      2                    AA                    
       score2                      2                    AT                    
       score2                      2                    TT                    
       score3                      2                    AA                    
       score3                      2                    AT                    
       score3                      2                    TT                    


For the LSM statement 1, create a graphic which has an X-axis with values AA, AT, and TT. Plot the score1 LSM values for AA, AT, and TT with a line joining those three values. Do the same for score2 and score3 LSM values (using different linetypes or line colors so that you can distinguish the logit traces from each model.) You will have three lines which should be approximately parallel if the proportional odds assumption is true. You will want to do the same for LSM statement 2, creating a separate graph for that set of results.

You can also condition on a genotype and use the LSM statement (representing your time effect) as the X-axis variable. That is, select the six LSM values with genotype AA and plot the logits against time (LSM statement) for the pair of score1 values, joining these with a line. Do the same for score2 and score3 values. You will generate a plot for genotype AA, another plot for genotype AT, and a third for genotype TT. Nonparallel lines in any of these plots indicates violation of the proportional odds assumption.

Of course, one does not expect exact parallel lines, so some judgement comes into play here.

If you can do away with METHOD=LAPLACE for approximating the integration of the random effects, then you could do the following for a more formal test:

data new_x;
  set x;
  record=_n_;
  new_score = (score>1); logit_spec=1; output;
  new_score = (score>2); logit_spec=2; output;
  new_score = (score>3); logit_spec=3; output;
run;

proc glimmix data=x;
class logit_spec farm genotype animal;
model new_score = logit_spec genotype sweek genotype*sweek
            logit_spec*genotype logit_spec*sweek logit_spec*genotype*sweek
            / dist=binary link=glogit s oddsratio (diff=all);
random int / subject=farm s;
random int sweek / subject=animal s;
random logit_spec / subject=record residual type=un;
run;


If the interactions of logit_spec with the other predictors are significant, that suggests that the parallel lines assumption is not valid. But this might not be an easy model to fit. Moreover, tests of this type are known to be liberal, so that you might declare that the lines are not parallel too often.
keckk
Fluorite | Level 6
Many thanks for your detailed response !

First I reached for the easier way and tried the latter in your contribution, but the model does not run with the generalized logit link (which obviously can only be used with the multinomial distribution). Using the logit link, there is not convergence as you had presumed.

When trying to fit my glimmix code with the two binary splits of the data (score is a three-category response 0,1,2) I face the problem that LSMEANS computations are not supported for multinomial models in SAS 9.2 .

Do I understand correctly that I may NOT perform inference in that model if the parallel lines assumption is NOT valid ?
What else can I use to the plot the estimated cumlogits against the predictor value (genotype) ?

Instead of lsmeans statements I assumed I may use odds ratios to indicate the estimated risk of one genotype for the score (being > 0 or 1) relative to the risk of another genotype (four levels 1, 2, 3 and 4) and can get associated p-values to test whether or not odds of score differ significantly between all pairs of genotypes (diff=all).
But I can only get associated p-values for 3 genotypes compared to a reference genotype (which by default is assigned the last/highest level of genotype).

Is there a way to get associated p-values for all pairwise odds ratios ?
Dale
Pyrite | Level 9
I don't have time for much of a response here, but when you expand the data so as to model the multiple binary splits, you can change your distribution from multinomial to binary and the link from cumlogit to glogit. That should allow you to estimate the least squares means.

If the parallel lines assumption is violated, then the multinomial model with cumulative logits is wrong and you should not be making inferences from that model.
keckk
Fluorite | Level 6
Thanks for your reply!

Unfortunately, the glogit link function does not work with the binary distribution...
I get the following ERROR message: The generalized logit can only be used with the multinomial distribution

Is the default link function logit mathematically feasible to the binary splits as well ?
Dale
Pyrite | Level 9
A binary response with logit link is a two-level multinomial response with generalized logit link. That is, the binary response with logit link is a special case of the multinomial response with generalized logit link. It should be noted that the binary response with logit link is also a special case of a multinomial response with cumulative logit link. For a binary response, there is no distinction between generalized logit and cumulative logit link functions.

Because there is no distinction between generalized logit and cumulative logit link functions, you might think that it would be possible to specify that the response follows a binary distribution with generalized logit link function. But, I guess that is not the case. When the response is binary, the GLIMMIX procedure wants you to use the logit link without specifying whether it is a generalized logit or cumulative logit.
keckk
Fluorite | Level 6
Ok, I think I get the picture.
I am just beginning with (multinomial, ordinal) logistic regression, with Hosmer and Lemeshow's at my disposal - which unfortunately does not have much about logistic regression models for the analysis of correlated data and particularly not with SAS procedures 😞
keckk
Fluorite | Level 6
I have I would like to continue this thread with a question.
After testing the proportional odds assumption as proposed by DALE and finding that the plots have indicated that it is not valid (non parallel lines in the plot for one of the genotypes) I want to raise the question what is the point to model an ordinal response (with three levels: 0,1,2), if I can fit the model with each binary split on the response variable (0 versus >0, 1 versus >1) ?

In the cumulative logit model with the multinomial response "the glimmix procedure is modeling the probabilities of levels of the response having lower ordered values in the response profile table": so e.g. if the oddsratio of genotype 1 to genotype 2 is significant, these two genotypes have a different probability having a score of 0 or 1 (< 2). Correct ? if not, what is meant by "having lower ordered values" - wouldn't it be the same as in the binary split model 0 or 1 versus >1 ?

Your comments are appreciated very much. Message was edited by: keckk

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 5965 views
  • 1 like
  • 2 in conversation