Re: Test of fixed effects from proc glimmix for multinomial nominal ou...

zipcat · Posted 08-13-2015 04:24 PM

Hello,

I am using Proc Glimmix to model a multinomial outcome with 3 un-ordered categories. The fixed effect independent variable is continuous and is measured at 3 or time points for each subject. I have included a random intercept in the model to account for the repeated measures within each subject. I have found that the Type III test of fixed effects F-value changes depending on what reference value I use for the outcome. I'm not clear as to why this happens.

Thanks for you help.

Regina

SteveDenham · Posted 08-17-2015 08:06 AM

What link are you using for the multinomial? If it is the default cumulative logit, the specification of reference group will make a substantial difference in slope estimates. If it is a generalized logit, then I have to admit I am surprised at what is happening. Can you share your code?

Steve Denham

zipcat · Posted 08-17-2015 09:42 AM

Thank you for taking a look.

I am using a glogit link.

Here is the code I am using:

title 'multinomial regression for labs';
proc glimmix data = vdata noclprint method = rmpl;
class id out;
model out (ref = '1') = lnlab / dist = multinomial link = glogit solution ;
random intercept /subject = id type = un group = out;
run;

* check the output from above *;

proc glimmix data = vdata noclprint method = rmpl;
class id out;
model out (ref = '2') = lnlab / dist = multinomial link = glogit solution ;
random intercept/subject = id type = un group = out;
run;

Regina

SteveDenham · Posted 08-17-2015 09:52 AM

How many levels for the variable 'out'? Do the estimates for inlab look like there is a reciprocal relationship, as there might be if there are only two levels for 'out'? Can you produce odds ratios under the two specifications (using an ESTIMATE statement) and see if they are inversely related?

I apologize for asking more questions than providing answers, but it's Monday morning and I'm not ready to assume anything yet.

Steve Denham

zipcat · Posted 08-17-2015 11:59 AM

There are 3 levels for the out variable.

From the first model, the odds ratio comparing level 2 to level 1 is 1.558.

From the second model, the odds ratio for comparing level 1 to level 2 is 0.627. They are not exact inverses of each other.

Regina

SteveDenham · Posted 08-17-2015 01:02 PM

Well, shoot. They are close, but not close enough.

More diagnostics: Can you give the solution vector for each of these parameterizations, and just so I can play a hunch, also with ref='3'?

For each of the parameterizations, also take a look at the objective function in the iteration history. Are the parameterizations taking the same number of iterations? Do they start at the same point? Do they end up at the same value? I am guessing: maybe for the first, yes for the second, and no for the third question. That implies that representational errors are accumulating during the iterations, and you end up with different values.

Steve Denham

zipcat · Posted 08-17-2015 02:29 PM

Here are the solutions from the 3 models:

'1 as reference ';

Solutions for Fixed Effects

Standard
Effect out Estimate Error DF t Value Pr > |t|

          Intercept    2       -2.5473      1.2101      200      -2.11      0.0365
          Intercept    3        0.3947      0.9824      200       0.40      0.6883
          lnlab        2        0.4435      0.1562      278       2.84      0.0049
          lnlab        3        0.1751      0.1286      278       1.36      0.1744

'2 as reference'

Solutions for Fixed Effects

Standard
Effect out Estimate Error DF t Value Pr > |t|

          Intercept    1        2.8082      1.3076      200       2.15      0.0330
          Intercept    3        3.8814      0.9604      200       4.04      <.0001
          lnlab        1       -0.4670      0.1665      278      -2.80      0.0054
          lnlab        3       -0.3887      0.1202      278      -3.23      0.0014

'3 as reference';

Standard
Effect out Estimate Error DF t Value Pr > |t|

          Intercept    1       -1.3388      1.1632      200      -1.15      0.2512
          Intercept    2       -3.4725      0.9893      200      -3.51      0.0006
          lnlab        1      -0.04545      0.1497      278      -0.30      0.7616
          lnlab        2        0.3355      0.1244      278       2.70      0.0074

Regarding the objective function, the number of iterations is different for each model (14, 10, and 11).

The starting point and end point for the objective function are different for each model.

Regina

SteveDenham · Posted 08-18-2015 09:32 AM

I have to admit I am missing something, and it's probably something simple. How about a contingency table of the data, with inlab divided at the mean (not the median)? Perhaps there is the equivalent of Simpson's paradox going on, such that one of the categories of out is heavily weighted on one side or the other of the mean.

Pinging JacobSimonsen. Do you have anything that might explain this?

Steve Denham

Rick_SAS · Posted 08-18-2015 09:59 AM

One "simple" thing I noticed is that the OP has listed the response variable on the CLASS statement. I've never seen this before and am not sure what it does. It might be affecting the levelization.

zipcat · Posted 08-18-2015 10:39 AM

I listed the response on the class statement since I got the following message in the log when I didn't include it in the class statement:

NOTE: The group variable in the generalized logit model does not appear in the CLASS statement. This might produce unexpected results if the

variable is not ordered in the input data.

Also the glimmix procedure for the multinomial model requires that the group option with the outcome is required on the random statement. This was the error listed in the log when I did not include the option on the random statement:

ERROR: Nominal models require that the response variable is a group effect on RANDOM statements. You need to add 'GROUP=out'.

Thanks for looking at this.

Regina

Rick_SAS · Posted 08-18-2015 10:02 AM

I'm also concerned about the GROUP= option on the RANDOM statement. Again, the response variable is being used, which seems suspect.

zipcat · Posted 08-18-2015 10:28 AM

Here is the contingency table of lab mean by out:

The FREQ Procedure

Table of lnlab_Mn by out

lnlab_Mn out

Frequency|

Percent |

Row Pct |

Col Pct | 1| 2| 3| Total

----------+---------+---------+---------+

< Mean | 42 | 30 | 164 | 236

| 8.71 | 6.22 | 34.02 | 48.96

| 17.80 | 12.71 | 69.49 |

| 76.36 | 23.44 | 54.85 |

---------+---------+----------+---------+

>= Mean | 13 | 98 | 135 | 246

| 2.70 | 20.33 | 28.01 | 51.04

| 5.28 | 39.84 | 54.88 |

| 23.64 | 76.56 | 45.15 |

---------+----------+---------+----------+

Total 55 128 299 482

11.41 26.56 62.03 100.00

Regina

SteveDenham · Posted 08-18-2015 11:19 AM

OK, that looks good--now how does it look by time point?

I am very tempted to suggest using a PROC I have never felt comfortable with--PROC CATMOD. It does allow for repeated measures of categorical data.

Steve Denham

zipcat · Posted 08-19-2015 11:02 AM

Here are the frequencies by time point;

day=0