BookmarkSubscribeRSS Feed
zipcat
Calcite | Level 5

Hello,

I am using Proc Glimmix to model a multinomial outcome with 3 un-ordered categories.   The fixed effect independent variable is continuous and is measured at 3 or time points for each subject.  I have included a random intercept in the model to account for the repeated measures within each subject.      I have found  that the Type III test of fixed effects F-value changes depending on what reference value I use for the outcome.  I'm not clear as to why this happens.

Thanks for you help.

Regina

14 REPLIES 14
SteveDenham
Jade | Level 19

What link are you using for the multinomial?  If it is the default cumulative logit, the specification of reference group will make a substantial difference in slope estimates.  If it is a generalized logit, then I have to admit I am surprised at what is happening.  Can you share your code?

Steve Denham

zipcat
Calcite | Level 5

Thank you for taking a look.

I am using a glogit link. 

Here is the code I am using:


title 'multinomial regression for labs';
proc glimmix data = vdata  noclprint method = rmpl;
class id out;
model out (ref = '1') = lnlab / dist = multinomial link = glogit solution ;
random intercept /subject = id type = un group = out;
run;

* check the output from above *;

proc glimmix data = vdata noclprint   method = rmpl;
class id out;
model out (ref = '2') = lnlab / dist = multinomial link =  glogit solution ;
random intercept/subject = id  type = un group = out;
run;

Regina

SteveDenham
Jade | Level 19

How many levels for the variable 'out'?  Do the estimates for inlab look like there is a reciprocal relationship, as there might be if there are only two levels for 'out'?  Can you produce odds ratios under the two specifications (using an ESTIMATE statement) and see if they are inversely related?

I apologize for asking more questions than providing answers, but it's Monday morning and I'm not ready to assume anything yet.

Steve Denham

zipcat
Calcite | Level 5


There are 3 levels for the out variable.

From the first model, the odds ratio comparing  level 2 to level 1 is  1.558.

From the second model,  the odds ratio for comparing  level 1 to level 2 is 0.627.   They are not exact inverses of each other.

Regina

SteveDenham
Jade | Level 19

Well, shoot.  They are close, but not close enough.

More diagnostics:  Can you give the solution vector for each of these parameterizations, and just so I can play a hunch, also with ref='3'?

For each of the parameterizations, also take a look at the objective function in the iteration history.  Are the parameterizations taking the same number of iterations?  Do they start at the same point?  Do they end up at the same value?  I am guessing: maybe for the first, yes for the second, and no for the third question.  That implies that representational errors are accumulating during the iterations, and you end up with different values.

Steve Denham

zipcat
Calcite | Level 5


Here are the solutions from the 3 models:

                       '1 as reference ';

                   Solutions for Fixed Effects

                                          Standard
          Effect       out    Estimate       Error       DF    t Value    Pr > |t|

          Intercept    2       -2.5473      1.2101      200      -2.11      0.0365
          Intercept    3        0.3947      0.9824      200       0.40      0.6883
          lnlab        2        0.4435      0.1562      278       2.84      0.0049
          lnlab        3        0.1751      0.1286      278       1.36      0.1744

                                 '2 as reference'

                                Solutions for Fixed Effects

                                          Standard
          Effect       out    Estimate       Error       DF    t Value    Pr > |t|

          Intercept    1        2.8082      1.3076      200       2.15      0.0330
          Intercept    3        3.8814      0.9604      200       4.04      <.0001
          lnlab        1       -0.4670      0.1665      278      -2.80      0.0054
          lnlab        3       -0.3887      0.1202      278      -3.23      0.0014

      

                                '3 as reference';

                                          Standard
          Effect       out    Estimate       Error       DF    t Value    Pr > |t|

          Intercept    1       -1.3388      1.1632      200      -1.15      0.2512
          Intercept    2       -3.4725      0.9893      200      -3.51      0.0006
          lnlab        1      -0.04545      0.1497      278      -0.30      0.7616
          lnlab        2        0.3355      0.1244      278       2.70      0.0074

 

  

Regarding the objective function, the number of iterations  is  different for each model (14, 10, and 11).

The starting point and end point for the objective function are different for each model.

Regina

SteveDenham
Jade | Level 19

I have to admit I am missing something, and it's probably something simple.  How about a contingency table of the data, with inlab divided at the mean (not the median)?  Perhaps there is the equivalent of Simpson's paradox going on, such that one of the categories of out is heavily weighted on one side or the other of the mean.

Pinging JacobSimonsen.  Do you have anything that might explain this?

Steve Denham

Rick_SAS
SAS Super FREQ

One "simple" thing I noticed is that the OP has listed the response variable on the CLASS statement. I've never seen this before and am not sure what it does. It might be affecting the levelization.

zipcat
Calcite | Level 5

I listed the response on the class statement since I got the following message in the log when I didn't include it in the class statement:

NOTE: The group variable in the generalized logit model does not appear in the CLASS statement. This might produce unexpected results if the

      variable is not ordered in the input data.

Also the glimmix procedure for the multinomial model requires that  the group option with the outcome is required on the random statement.  This was the error listed in the log when I did not include the option on the random statement:

ERROR: Nominal models require that the response variable is a group effect on RANDOM statements. You need to add 'GROUP=out'.

Thanks for looking at this.

Regina

Rick_SAS
SAS Super FREQ

I'm also concerned about the GROUP= option on the RANDOM statement. Again, the response variable is being used, which seems suspect.

zipcat
Calcite | Level 5

Here is the contingency table of lab mean by out:


   

                   The FREQ Procedure                  

                                                       

                Table of lnlab_Mn by out               

                                                       

      lnlab_Mn     out                                 

                                                       

      Frequency|                                       

      Percent     |                                       

      Row Pct    |                                       

      Col Pct      |          1|         2|           3|  Total     

           ----------+---------+---------+---------+            

      < Mean     |       42 |       30 |     164 |    236     

                       |    8.71 |    6.22 |  34.02 |  48.96     

                       |  17.80 |  12.71 |  69.49 |            

                       |  76.36 |  23.44 |  54.85 |            

           ---------+---------+----------+---------+            

      >= Mean  |       13 |        98 |     135 |    246     

                      |    2.70 |   20.33 |  28.01 |  51.04     

                      |    5.28 |   39.84 |  54.88 |            

                      |  23.64 |   76.56 |  45.15 |            

          ---------+----------+---------+----------+            

      Total                 55      128      299      482     

                          11.41    26.56    62.03   100.00    

Regina

SteveDenham
Jade | Level 19

OK, that looks good--now how does it look by time point?

I am very tempted to suggest using a PROC I have never felt comfortable with--PROC CATMOD.  It does allow for repeated measures of categorical data.

Steve Denham

zipcat
Calcite | Level 5

Here are the frequencies by time point;

day=0

                                  Row
out    lnlab_Mn    Frequency    Percent

1     < Mean           1       33.3333
1     >= Mean          2       66.6667
2     < Mean           1        7.6923
2     >= Mean         12       92.3077
3     < Mean           8       44.4444
3     >= Mean         10       55.5556


day=1

                                  Row
out    lnlab_Mn    Frequency    Percent

1     < Mean           5       62.5000
1     >= Mean          3       37.5000
2     < Mean           5       17.8571
2     >= Mean         23       82.1429
3     < Mean          25       50.0000
3     >= Mean         25       50.0000


day=2

                                  Row
out    lnlab_Mn    Frequency    Percent

1     < Mean           7          70
1     >= Mean          3          30
2     < Mean           6          24
2     >= Mean         19          76
3     < Mean          28          56
3     >= Mean         22          44


day=3

                                  Row
out    lnlab_Mn    Frequency    Percent

1     < Mean           7       87.5000
1     >= Mean          1       12.5000
2     < Mean           5       23.8095
2     >= Mean         16       76.1905
3     < Mean          32       60.3774
3     >= Mean         21       39.6226


day=4

                                  Row
out    lnlab_Mn    Frequency    Percent

1     < Mean           7       87.5000
1     >= Mean          1       12.5000
2     < Mean           3       20.0000
2     >= Mean         12       80.0000
3     < Mean          26       59.0909
3     >= Mean         18       40.9091


day=5

                                  Row
out    lnlab_Mn    Frequency    Percent

1     < Mean           5       71.4286
1     >= Mean          2       28.5714
2     < Mean           5       41.6667
2     >= Mean          7       58.3333
3     < Mean          19       55.8824
3     >= Mean         15       44.1176


day=6

                                  Row
out    lnlab_Mn    Frequency    Percent

1     < Mean           5       83.3333
1     >= Mean          1       16.6667
2     < Mean           3       30.0000
2     >= Mean          7       70.0000
3     < Mean          15       51.7241
3     >= Mean         14       48.2759



day=7

                                  Row
out    lnlab_Mn    Frequency    Percent

1     < Mean           5       100.000
1     >= Mean          0         0.000
2     < Mean           2        50.000
2     >= Mean          2        50.000
3     < Mean          11        52.381
3     >= Mean         10        47.619

Regina

SteveDenham
Jade | Level 19

I think it comes down to the fact that you have a fair amount of missing values, so the marginal odds ratios over day are strongly influenced by the slopes at each day.  For instance, for out=3, at Day=0 you have 18 observations, while at Day=1 and Day=2 there are almost 3 times as many (in the fifty-ish range).  For out=2, the dropout rate from Day 1 to Day 7 is nearly 86%.  So, even though you are not explicitly modeling the repeated nature, it is affecting what is going on.  By aggregating over time, the trend due to inlab becomes confounded with the day of measurement.

Since you are using pseudolikelihood, adding a repeated effect (although it has to go in as a G side effect) can be done as below:

proc glimmix data = vdata  noclprint method = rmpl;

class id out day;

model out (ref = '1') = lnlab*day / dist = multinomial link = glogit solution ;

random intercept /subject = id type = un group = out;

random day/ type=ar(1) subject=id;

run;

Two parts to check here--first, will it even run, and second, is it still sensitive to the specification of the reference group.  Note that the repeated nature is modeled as an autoregressive process (AR(1)), which assumes equal spacing in time.  If Day is representing a "visit" (to borrow from clinical biostats) such that these are not equally spaced in time, you may wish to consider other structures.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 2605 views
  • 0 likes
  • 3 in conversation