Re: Estimate probabilities with multinomial in GLIMMIX

ASJ1 · Posted 11-03-2021 09:58 AM

I have a multinomial unordered outcome variable with 3 levels, and 5 factors (1 between and 4 within subjects). I use the RANDOM statement of the GLIMMIX procedure to take into account all the measurements on the same participant. I use the LSMEANS statement from PLM Procedure to get the probabilities for the relevent variables. However, I only get probabilities for 2 of the outcome categories (the % for the reference are not displayed).

I have tried to change the reference category so that I can get the probabilities for the reference category. However, when I do that, all the results are slightly different (covariance parameters, type 3 tests, lsmeans, std error of lsmeans, 95% ci), etc.

In models without random effects, this is easy to do, and changing the reference category gives you the probabilities of 3rd category, without changing anything else in the results.

I've tried different things (fixing covariance parameters with parms, different estimation methods, lsmestimate statement), to no avail.

Here is the code I use:

PROC GLIMMIX DATA = don ;
CLASS id group x1 x2 x3 x4 y;
MODEL y(REF = "3") = group|x1| x2|x3| x4 @3 /LINK = GLOGIT DIST = MULT;
RANDOM id(group) /GROUP = y;
STORE OUT = m_pdh;
RUN;

PROC PLM RESTORE = m_pdh;
LSMEANS x1 /ilink CL;
RUN;

StatDave · Posted 11-03-2021 11:40 AM

This can be done using contrasts in the NLMeans macro. See details of using this macro at that link. The following uses the data in the example titled "Nominal Response Data: Generalized Logits Model" in the PROC LOGISTIC documentation. Another example that computes relative risk estimates can be seen in the latter part of this note. These statements estimate the School 1 probabilities in each of the three Styles.

proc glimmix data=school;
   freq Count;
   class School Program(ref=first) style;
   model Style(order=data)=School Program / dist=mult link=glogit;
   store log;
   run;
proc plm restore=log;
   lsmeans school / ilink e plots=none;
   ods output coef=c;
   run;
data cont;
  length label $40;
  infile datalines missover;
  input label k1-k9;
  set=1;
  datalines;
P(self,S1)    1  0  0    0  0  0   0  0  0
P(team,S1)    0  0  0    1  0  0   0  0  0
P(class,S1)   0  0  0    0  0  0   1  0  0
;
%nlmeans(instore=log, link=glogit, coef=c, contrasts=cont)

ASJ1 · Posted 11-08-2021 04:57 PM

Thanks for the solution @StatDave

I was skeptical it would work in my case because your example does not include a RANDOM statement, but it does work even with it. Yeah!

I was able to get the results with my data & model for a main effect. However, it does not seem to be working when I try to get it for an interaction term. The macro gives an error: ERROR: Error generating code

Furthermore, the solution still has the problem that I get different results depending on which outcome I choose as reference in the model statement for the outcome, but that is another issue.

StatDave · Posted 11-08-2021 05:16 PM

Explain what you mean using the example I provided. In that example, if you only add the School*Program interaction and change ORDER=DATA to REF="self" to change the reference level, it runs fine and the results are the same as the original code. Now, if you specify the interaction term in the LSMEANS statement instead of just School, then you would now have many more LSMEANS and therefore the coefficients in the DATA CONT step would have to change to accommodate that.

ASJ1 · Posted 11-09-2021 12:49 PM

There are 2 issues here.

The first is not related to your solution, but to the reason of the problem.

Using your example, with a random statement, if you change the ref from self to team for example, the type 3 results are different, as well as the results from lsmeans (the proportion and standard error for class which appear in both outputs, and the proportions do not add up to 1). However, when running multinomial regression without a random effect, the results are identical. This is the source of the problem. Even fixing covariance parameters or changing estimation method did not work. I find it difficult to interpret, when the results differs depending on the choice of the reference, while it does not affect the results in absence of a random effect.

proc glimmix data=school;

freq Count;

class School Program(ref=first) style;

model Style(ref = "self")= program / dist=mult link=glogit ;

random school /group = style;

store log;

run;

proc plm restore=log;

lsmeans program / ilink e plots=none;

run;

proc glimmix data=school;

freq Count;

class School Program(ref=first) style;

model Style(ref = "team")= program / dist=mult link=glogit ;

random school /group = style;

run;

It is because of this problem that your solution was interesting, to get the % for the reference group that can add up to 100%. Your solution worked fine with my dataset & model with main effects, but not when I try to get it for an interaction (with an interaction term in plm, even if I had more columns to the cont dataset). It takes a really long time to run and after I get an error. ERROR: Error generating code.

However, with your example, I’m able to reproduce the lsmeans results with %nlmeans.

proc glimmix data=school;

freq Count;

class School Program(ref=first) style;

model Style(order=data)=School| program / dist=mult link=glogit solution;

store log;

run;

proc plm restore=log;

lsmeans school*program / ilink e plots=none;

ods output coef=c;

run;

data cont;

length label $40;

infile datalines missover;

input label k1-k18;

set=1;

datalines;

P(self,S1,regular) 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

P(team,S2,afternoon) 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

;run;

%nlmeans(instore=log, link=glogit, coef=c, contrasts=cont)

So it is possible to make it work, I just need to figure out what goes wrong with my own data & model.

EDIT: The ERROR: Error generating code. happens when in %NLEstimates in the section " proc nlmixed data = _tParm;". This was resolved by simplifying the model (I had lots of double & triple interactions). If I keep only some of them, the macro runs fine. I suppose that otherwise they are to many parameters to estimate. So, %nlmeans can work nicely to get proportions for the reference category, for main effets as well as interactions, but only for "small" models. Writing down the data "cont" can also become quite long for models with lots of terms.

Estimate probabilities with multinomial in GLIMMIX