Glimmix for multinomial data

soaresrd · Posted 10-13-2015 05:31 PM

Hi Steve Denham, I saw that you has been helping a lot people with question about PROC GLIMMIX, so I would like to check with you if you can help me.

I have the variable MOV that consisted of 5 scores (1: animal stood still for the entire assessment period; 2: animal stood still for most of the assessment period; 3: animal reacted calmly at least half of the assessment period; 4: animal reacted intensely more than half of the assessment period; 5: the animal jumped, raised its limbs off the ground).

So, I would like to see if the animals changed their scores over time (day 1, day 2, day3, day 4, and day5) by each breed (breed 1, 2 and 3). Do you know how can I do this model?

My suggestion is:

proc glimmix data=mydata;

class animal breed day;

model mov (order=data) = breed day breed*day / dist=multinomial;

random day / subject = animal;

run;

Also I would like to see something like LSMEANS and compare this by TUKEY, but MULTINOMIAL distribution do not accept LSMEANS.

Cheers

SoaresRD

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

From: SteveDenham
Date: 10-07-2015 02:11 PM

Start with this:

proc glimmix data=mydata method=laplace;
class animal breed day;
model mov (order=data) = breed day breed*day / dist=multinomial solution;
random day / subject = animal type=ar(1);
run;

The shift to METHOD=LAPLACE is a logical way to get conditional values (conditioned on the day structure). Since your data are equally spaced, an autoregressive error structure makes the most sense. And the solution for the fixed effects is needed, as you are going to have to construct ESTIMATE statements for the odds ratios of interest, and that is the easiest way I can think of today to get the right terms to enter into the ESTIMATE statements. Try running this model to check for convergence, etc. and post the results back to the forum. @lvm and @sld are real good sources for questions like this as well.

Steve Denham

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Hi Steve, thanks for your answer.

I'm not sure if the model is working, hence it showed a dot (.) instead the P-value in for "breed" and "breed*day".

I will let you know more about my data. First, day is not equally spaced (days 1 to 5, actually have different interval between the days), so I suppose that the best structure can be ARh(1) or CS. Is that right? Second, I want to know if the animal (individual) changed it MOV scores accros the days, because for me is an important information if the animal started the feedlot period with MOV score = 1 (calm animal inside the squeeze chute) on day 1, but at the end of the feedlot period (day 172) it is change for MOV score 5 (agitated animal), just like the example below.

animal breed day MOV

1 a 1 1

1 a 2 1

1 a 3 2

1 a 4 3

1 a 5 5

I have another variable (FS) that is continuous and normal distribution that I use PROC MIXED with repeated statement to analyse. So, then I have the LSMEANS of each day (and by each breed of animals) with comparison of means using post- hoc Tukey test.If I do the same with MOV variable, I will have the outcome with the LSMEANS of MOV, like 2.5 on day 1 and 2.8 on day 5 of MOV score, which it doesn't mean anything for me, and for sure, an animal that changed from 2.5 to 2.8 score between day 1 and day 5 is not siggnificant by Tukey test.

So, I was thinking about the model that you sent me and I would change for:

proc glimmix data=mydata method=laplace;

class animal breed day;

model mov (order=data) = breed day breed*day / dist=multinomial solution;

random BREED*day / subject = animal type=ar(1);

LSMEAN BREED*DAY / PDIFF=ALL (this is how I do using MIXED to get the comparision accross the days, but what about Glimmix?)

run;

Well, I'm appreciate your help, and let me know if you have any idea how to do this.

SoaresRD

xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

From: SteveDenham
Date: 10-12-2015 08:09 AM

Conceptually, analyzing a multinomial distribution means that there is no estimator of a mean, so this is something you have to get used to. If you feel your categories are relatively equally spaced, then an average might be meaningful, so analyzing as a continuous variable remains an option. But--if they are truly categorical, you are looking at the odds of a score of x+1 or higher vs a score of x or less. That's just how a multinomial analysis works.

As far as time spacing, do you have consistent times for animals across treatments? If so, and they are unequally spaced, a CS, CSH or spatial power structure would probably be appropriate. Autoregressive structures (AR or ARH) really only apply to equally spaced in time measures. Note that, at least for the spatial power structure, you should use the actual time values, rather than the coded 1 through 5 values.

As I have said before, be sure to post this in the Statistics forum--there are good people there with excellent answers.

Steve Denham

sld · Posted 10-15-2015 05:12 PM

@soaresrd,

As @SteveDenham points out, in a multinomial model there is no single estimate of a “mean” for a given breed on a given day. Rather you get estimates of (m-1) logits, where m is the number of outcomes; with 5 levels of MOV, there would be 4 logits. The logits can be defined in several different ways; by default, your model is using cumulative logits (i.e., the proportional odds model). For ordinal multinomial responses, other possibilities are partial proportional odds, nonproportional odds, adjacent categories, and continuation ratio. The interpretation of the logits depends upon the definition. The choice of logit depends on objectives as well as data characteristics. (For example, not all data sets will meet the assumptions of the proportional odds model.)

Results can be presented/interpreted in terms of logits. Alternatively, it is possible to re-transform to obtain a set of probabilities—i.e., {Pr(mov=1), Pr(mov=2), ..., Pr(mov=5)}—for each breed on each day. In GLIMMIX, this is easily accomplished using the OUTPUT statement using ilink and noblup options.

Consider this code:

proc glimmix data=mydata method=laplace;

nloptions gconv=0;

class breed day animal;

model mov = breed day breed *time / dist=multinomial link=clogit solution;

random day / subject= animal(breed) type=<whatever is appropriate>;

output out=ds2 pred(noblup)=predpa stderr(noblup)=stderrmupa

pred(ilink noblup)=predmupa stderr(ilink noblup)=stderrmupa;

run;

Using subject=animal(breed) rather than subject=animal may well solve your problem of missing p-values for breed and breed*day.

In my opinion, multinomial models take substantial effort to comprehend. To get you started, here is a link to a relatively short tutorial by Alan Agresti

http://www.stat.ufl.edu/~aa/ordinal/agresti_ordinal_tutorial

Hope this helps,

Susan

SteveDenham · Posted 10-19-2015 01:45 PM

Understatement of the week: "multinomial models take substantial effort to comprehend" (Thanks, @sld)

First, you need to understand odds.

Then, odds ratios.

Then, odds ratios at differing levels of class variables.

Then, odds ratios at differing levels of class variables at correlated time points.

Then, confidence bounds on odds ratios at differing levels of class variables at correlated time points.

At which point one begins to ask why the old lady swallowed the fly in the first place. (reference to an old American folk song).

Steve Denham

Glimmix for multinomial data

Re: Glimmix for multinomial data

Re: Glimmix for multinomial data

Catch up on SAS Innovate 2026

Glimmix for multinomial data

Re: Glimmix for multinomial data

Re: Glimmix for multinomial data

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away