Solved: Re: Post-Hoc Tests for Multinomial Data Using Glimmix

JaneGreenfield · Posted 08-16-2015 05:50 AM

Hello,

I am attempting to replicate and further a 4 (socio-economic status) x 6 (question type) study. The DV (question type) is measured with a 12 item questionnaire (6 categories containing 2 questions each). Participants in each category can score between 0 and 2 (i.e., two questions). In my study as well as the aforementioned study, most participants score 0 across all 6 categories. The data therefore does not satisfy the normality assumption for the ANOVA used in the aforementioned study, as the data is skewed to the right and zero-inflated. My study tests an extra variable ‘gender’ theorized to affect the relationship explored in the aforementioned study. That is, my study design is 2 (gender) x 3 (socio-economic status) x 6 (question type). An reliable source suggested an ordered logistic regression in a mixed-modeling framework such as GLIMMIX.

The code for my final model is presented below. Model one was unconditional with no predictor, model two had socioeconomic status (SES) as a predictor, and the final model has SES and gender as predictors.

Proc glimmix data=work.ses method=laplace noclprint;

Class question;

Model response=ses gender see*gender / dist=multinomial link=clogit solution cl oddsratio(diff=first label);

Random ses / subject=question type=vc solution cl;

Covetest / wald;

Run;

* Response = answers to the questions. Question = quoits type

So far I have gotten suitable results, model two is a better fit than model one, and model three is a better fit than model two. In the final model, fixed effects for SES p< .05, and gender p< .001. The problem is I can’t figure out how to get post-hoc tests on the probabilities for the particular effects of the four levels of SES on the six question types

(as determined by the question responses on a scale ranging from 0 - 2), the two levels of gender on question type, and the interaction between SES and gender on question type

I have tried Lsmeans but it doesn’t work with multinomial data, I have tried splice and splicediff, as well as contrast (bycat and chisq). I am new to SAS; this particular problem brought me here (SPSS is my comfort zone).

Any advice will be much appreciated.

Regards,

Jane Greenfield

SteveDenham · Posted 08-19-2015 10:50 AM

Remarkably good for code, especially if you are just getting started. I'll get to some minor changes later.

Anyway, I am a firm believer in Type III tests. The Solution for Fixed Effects table isn't truly global, as at least one level of every class variable and the resultant interaction variables are set to zero. In my book, it is a "nice to have" thing.

Now, as far as code--you could combine the two analyses into one, and use the simple effect comparisons (slicediff) to get everything in one run. Also, you could add the ilink and oddsratio options to the lsmeans statement, to get something like:

Proc glimmix data=work.ses2 method=laplace noclprint;

Class gender ses question;

Model bi_res=gender ses question gender*ses gender*question/ dist=binomial

link=logit solution cl oddsratio(diff=first label);

Random intercept / subject= question type=vc solution cl;

/* This is the one area I am not sure about from your design. This approach considers the questions to be uncorrelated measures. If that is the case, and since you have them as fixed effects as well, you may be able to drop this. If however you believe them to be correlated in some way, you may want to change to type=un or something of the sort */

Covtest / wald;

lsmeans gender*ses /slice=ses slicediff=ses adjust=bon ilink oddsratio;

lsmeans gender*question /slice=question slicediff=question adjust=bon ilink oddsratio;

ods select slicediffs;

run;

I think the Bonferroni adjustment is probably too conservative. Consider replacing adjust=bon with adjust=simulate adjdfe=row, for the method that best preserves both power and family wise error rates.

Steve Denham

View solution in original post

SteveDenham · Posted 08-17-2015 08:46 AM

Have you tried using the ESTIMATE statement, using the same sort of code you used to implement the CONTRAST statement? The ESTIMATE statement allows for multiple comparisons, and thus allows for post-hoc adjustment for multiple comparisons.

This is untried, as I haven't tried doing this with a multinomial response model. It works well with the other distributions,

Take a look at Example 44.13 in the SAS/STAT 13.2 documentation--Response Surface Comparisons with Multiplicity Adjustments. Down towards the end is some example code where the ESTIMATE statement is used for post hoc comparisons, using the simulate adjustment of Edwards and Berry.

Steve Denham

JaneGreenfield · Posted 08-19-2015 08:46 AM

Hi Steve,

Thanks for your response. I didn’t want to reply in haste; I thought I should take some time to read more and figure out more about SAS and GLIMMIX.

In my original post I was kind of hoping that my GLIMMIX code will be torn apart because I am very new to this and don’t know all the options to be confident enough in the options I have chosen.

To simplify my case, I have made my DV binomial, so instead of a 0,1, or 2 score, it is a ‘score ‘ (2 and 1 =1) or no ‘score’ (0). This will enable me to use the LSMEANS and SPLICEDIFF, which I have done successfully. However, there are a few things I am really confused about and hoping you can help clear up.

I was hoping to use the Type 3 fixed effects as a global test of my hypothesis and LSMEANS comparisons to test the specific relationships between my variables. From what I have gathered in the discussion forum (albeit in a study design different from mine), it is best to use the Solutions for Fixed Effects. Will this apply in my case?

With my new code (I have to include the other IVs in the class statement to be able to get LSMEANS for them), the Solutions for Fixed Effects is giving me a more detailed output at the variable level. In my training (far from SAS/GLIMMIX), for my study design you first look for a global effect, only on the condition that there is a significant global effect do you look for specific effects. Is the Type 3 Test this global effect and Solutions for Fixed Effects the specific effects? And the LSMEANS is used to further test other specific comparisons?

In the Solution for Fixed Effects, I am only getting results for variable level -1, I had thought this was a problem but apparently this is standard? If I do use the Solutions for Fixed Effect I am really confused on how to interpret it . For instance, SES in my data has four levels, so I have results for levels 1 to 3, what do I say about level 4? Are the figures on the row of SES level 3 comparisons between 3 and 4, and so forth?

Finally, how important is it that I should report a table of my model building process? I ask this question because with the addition of all my IVs in the class statement, I no longer have neat single variable estimate and SE for each IV in the Solutions for Fixed Effects table, but have estimates for all the levels. I wanted this table to have the values for the global effects so I can write about the specific effects, given that, at least in APA standard, you either present in a table or write about it, but not both. `

I apologize for all this questions, I have tried figuring it out myself but with the time constraint and pressure, the more I search the more confused I get. When this is over I will start learning about SAS from the fundamentals. I have included my final code below.

Proc glimmix data=work.ses2 method=laplace noclprint;

Class gender ses question;

Model bi_res=gender ses question gender*question/ dist=binomial

link=logit solution cl oddsratio(diff=first label);

Random intercept / type=vc solution cl;

Covtest / wald;

lsmeans gender*question /slice=question slicediff=question adjust=bon;

ods select slicediffs;

run;

The above lsmeans is to investigate the difference between questions according to gender and the lsmean below is to investigate the difference between ses according to gender. There are ten different tests, therefore I will be using a p value of 0.05/10 to determine statistical significance.

Proc glimmix data=work.ses2 method=laplace noclprint;

Class gender ses question;

Model bi_res=gender ses question gender*ses/ dist=binomial

link=logit solution cl oddsratio(diff=first label);

Random intercept / type=vc solution cl;

Covtest / wald;

lsmeans gender*ses /slice=ses slicediff=ses adjust=bon;

ods select slicediffs;

run;

Thanks in advance.

Jane greenfield.

SteveDenham · Posted 08-19-2015 10:50 AM

Remarkably good for code, especially if you are just getting started. I'll get to some minor changes later.

Anyway, I am a firm believer in Type III tests. The Solution for Fixed Effects table isn't truly global, as at least one level of every class variable and the resultant interaction variables are set to zero. In my book, it is a "nice to have" thing.

Now, as far as code--you could combine the two analyses into one, and use the simple effect comparisons (slicediff) to get everything in one run. Also, you could add the ilink and oddsratio options to the lsmeans statement, to get something like:

Proc glimmix data=work.ses2 method=laplace noclprint;

Class gender ses question;

Model bi_res=gender ses question gender*ses gender*question/ dist=binomial

link=logit solution cl oddsratio(diff=first label);

Random intercept / subject= question type=vc solution cl;

/* This is the one area I am not sure about from your design. This approach considers the questions to be uncorrelated measures. If that is the case, and since you have them as fixed effects as well, you may be able to drop this. If however you believe them to be correlated in some way, you may want to change to type=un or something of the sort */

Covtest / wald;

lsmeans gender*ses /slice=ses slicediff=ses adjust=bon ilink oddsratio;

lsmeans gender*question /slice=question slicediff=question adjust=bon ilink oddsratio;

ods select slicediffs;

run;

I think the Bonferroni adjustment is probably too conservative. Consider replacing adjust=bon with adjust=simulate adjdfe=row, for the method that best preserves both power and family wise error rates.

Steve Denham

JaneGreenfield · Posted 08-19-2015 05:10 PM

Hi Steve,

Thanks for your time and help.

I have updated my code according to your suggestions. The levels in the ‘question’ variable are correlated, therefore I used type=un. When I run the code however there are no values for ‘question’ in the Type III tests. The values appear when I remove “subject=question” from the random statement. Are there any serious ramifications for removing this?

Best regards,

Jane Greenfield.

sld · Posted 08-21-2015 12:32 AM

If:

participant is the replicating factor

each participant is associated with one level of gender and one level of ses

each participant responds to all six question types

the response is binary

then consider this model. It uses a different subject in the random statement which might be more successful.

proc glimmix data=work.ses2 method=laplace;

class gender ses question participant;

model bi_res= gender ses question

gender*ses gender*question ses*question

gender*ses*question

/ dist=binomial link=logit solution cl oddsratio(diff=first label);

*random intercept / subject=participant(gender ses);

random question / subject=participant(gender ses) type=chol;

covtest diagr;

lsmeans gender*question / ilink plot=meanplot(join cl sliceby=gender)

slice=(question gender) slicediff=(question gender)

adjust=simulate(seed=12345);

lsmeans ses*question / ilink plot=meanplot(join cl sliceby=ses)

slice=(question ses) slicediff=(question ses)

adjust=simulate(seed=12345);

lsmeans gender*ses*question / ilink plot=meanplot(join cl sliceby=gender

plotby=question);

run;

The second random statement accommodates correlated questions. The first random statement is commented out because it is subsumed in the second random statement. But if the covtest result indicates a lack of covariance among questions, then you could use the first random statement and omit the second random statement.

The second random statement could use type=un if type=chol does not work well. You need enough participants to successfully estimate an unstructured covariance matrix.

The second random statement might benefit from the addition of the residual option, if the current syntax doesn’t work well. Theoretically not, but sometimes practically so in my experience.

Pairwise mean comparisons are very useful to compare levels of main effects, and might be useful to sort out two-way interactions, but are much less useful for three-way interactions (so I haven't included pairwise comparisons for the 3-way interaction means). I usually examine the interaction plots and determine what (if any) post-hoc contrasts (typically multiple degrees of freedom contrasts) make sense in the context of the particular study.

Clearly I haven’t tested this code so I can’t guarantee anything. You can let me know!

Susan

SteveDenham · Posted 08-21-2015 09:06 AM

Sweet code, Susan.

Steve Denham

sld · Posted 08-21-2015 10:04 AM

Thank you, Steve. I incorporated your suggestions, and hope that Jane notes that she can add what I didn't (like the oddsratio option); I don't mean to imply by omission that they are not valid or useful.

I'm remembering another thread in which you addressed conditional versus marginal estimates, and along that line I'll add that

random question / subject=participant(gender ses) type=chol;

produces conditional estimates, while

random question / subject=participant(gender ses) type=chol residual;

produces marginal estimates. As you've said elsewhere and previously, both are valid. Walt Stroup prefers conditional, I usually go that route if it works well.

Susan

SteveDenham · Posted 08-21-2015 10:47 AM

I also prefer the conditional estimates, when I can get them. Mostly to get IC's that are meaningful for model or covariance structure comparisons.

But marginal estimates, while biased, are more likely to converge and lots of people in other fields think of GEE's which are marginal models. The advantage of actually having a converged model is probably why Larry Madden ( @lvm) has been such a strong proponent for that approach. Applying Box's maxim: All models are wrong, but some models are useful. A model that converges is certainly more useful than one that has algorithm problems, and my experience with multinomial distributions is that if there is a sure way to run into odd algorithm behavior, it involves multinomials. And that is true whether it is SAS or Stata or R or S-Plus or MATLAB.

Steve Denham

JaneGreenfield · Posted 08-21-2015 11:06 AM

Hi Susan (and Steve),

Thank you so much for your help. I ran the code successfully, albeit with the omission of the second random statement because when I included it all my F and t values in the Type III and parameter estimates table were at infinity.

One more thing , if the data is multinomial, apart from dist=mult and link=clogit, how would you write the rest of the code? Particularly to get simple effect comparisons.

With gratitude,

Jane Greenfield.

sld · Posted 08-26-2015 05:02 PM

Jane,

This link points to pages in the text Generalized Linear Mixed Models... by Walt Stroup.

http://tinyurl.com/q44spl5

Example 13.2 addresses a generalized logit using GLIMMIX and the ESTIMATE statement, which has suggested in one of his comments above. Example 13.1 on pp 401-404 addresses the cumulative logit. I haven't been able to trick Google books into showing those pages. But if you go to the CRC Press page for Generalized Linear Mixed Models: Modern Concepts, Methods and Applications - CRC Press Book and click Downloads/Updates, unzip, and look in the "All Examples by Chapter" folder and Ch_13 subfolder, you'll find code (including the data) using GLIMMIX in the "Data_Set_13_1_with_individual_datalines.sas" file for both the clogit and glogit models.

I haven't worked much with multinomial models (especially mixed) but here's my stab at post-hoc pairwise comparisons among TRTs (simple effect comparisons), starting with Walt's example:

proc glimmix data=indiv_data_multinom;

class blk trt;

model rating=trt/dist=multinomial solution oddsratio;

random intercept / subject=blk solution;

/* Predicted values for each rating in each TRT,

the inverse-link value for c=2 is 1-(c0 + c1) */

estimate 'c=0, t=0' intercept 1 0 trt 1 0 ,

'c=1, t=0' intercept 0 1 trt 1 0,

'c=0, t=1' intercept 1 0 trt 0 1 0,

'c=1, t=1' intercept 0 1 trt 0 1 0,

'c=0, t=2' intercept 1 0 trt 0 0 1 0,

'c=1, t=2' intercept 0 1 trt 0 0 1 0,

'c=0, t=3' intercept 1 0 trt 0 0 0 1 0,

'c=1, t=3' intercept 0 1 trt 0 0 0 1 0,

'c=0, t=4' intercept 1 0 trt 0 0 0 0 1 0,

'c=1, t=4' intercept 0 1 trt 0 0 0 0 1 0,

'c=0, t=5' intercept 1 0 trt 0 0 0 0 0 1,

'c=1, t=5' intercept 0 1 trt 0 0 0 0 0 1 / ilink bycat;

/* Pairwise comparisons among TRTs with adjusted p-values */

estimate

't=0 == t=1' trt 1 -1 0 0 0 0,

't=0 == t=2' trt 1 0 -1 0 0 0,

't=0 == t=3' trt 1 0 0 -1 0 0,

't=0 == t=4' trt 1 0 0 0 -1 0,

't=0 == t=5' trt 1 0 0 0 0 -1,

't=1 == t=2' trt 0 1 -1 0 0 0,

't=1 == t=3' trt 0 1 0 -1 0 0,

't=1 == t=4' trt 0 1 0 0 -1 0,

't=1 == t=5' trt 0 1 0 0 0 -1,

't=2 == t=3' trt 0 0 1 -1 0 0,

't=2 == t=4' trt 0 0 1 0 -1 0,

't=2 == t=5' trt 0 0 1 0 0 -1,

't=3 == t=4' trt 0 0 0 1 -1 0,

't=3 == t=5' trt 0 0 0 1 0 -1,

't=4 == t=5' trt 0 0 0 0 1 -1 / adjust=simulate(seed=12345) adjdfe=row;

run;

With a lot more thought and a lot more coding, you could work up pertinent contrasts involving two factors (e.g., ses and question). The task would be easier in the absence of significant interactions

HTH,

Susan

JaneGreenfield · Posted 08-28-2015 11:12 AM

Thank you very much Susan.

Jane.

Catch up on SAS Innovate 2026