Moving multinomial logistic from GLIMMIX to GEE

Kastchei · Posted 05-29-2020 10:51 AM

Hi there,

I have a multinomial model with repeated measures that I was trying to model using GLIMMIX. The outcome is actually ordinal (0 = none, 1 = low, 2 = medium, 3 = high), but it does not seem to meet the proportional odds assumption for a cumulative logit model. I switched to a generalized logit model instead. The problem is that the switch has made the model too large to run. I get this error:

ERROR: Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time on this system. Consider changing your model.

Here is the syntax:

proc glimmix data = cogFunc._07_Model;
    class ID outcomeClass (ref = 'None/minimal') exposureC (ref = '1 = Low');
    model outcomeClass = exposureC / distribution = multinomial link = gLogit solution cl intercept;
    random intercept / subject = ID group = outcomeClass;
    nLOptions maxIter = 5000;
run;

As you can see, it's already a simple model, so I don't think there's any way for me to change the model. GLIMMIX will not model R-side effects for multinomial models, hence the G-side intercepts. GENMOD does not do generalized logit models, only cumulative logit models. I did try using numeric variables instead of character variables; I saw that suggested in another thread, but it did not help.

proc glimmix data = cogFunc._07_Model;
    class ID outcomeClassN (ref = '0') exposureN (ref = '1');
    model outcomeClassN = exposureN / distribution = multinomial link = gLogit solution cl intercept;
    random intercept / subject = ID group = outcomeClassN;
    nLOptions maxIter = 5000;
run;

I have also tried different methods (laplace, quad), but the procedure seems to stop way before any actual estimating occurs. It seems to get to the dimensions and just throw its hands up! I have all my SAS memory options maxed, and I have 24 GB of RAM.

I think the problem is that I have over 16k subjects. Here's the partial output I receive.

The GLIMMIX Procedure

Model Information
Data Set	COGFUNC._07_MODEL
Response Variable	outcomeClass
Response Distribution	Multinomial (nominal)
Link Function	Generalized Logit
Variance Function	Default
Variance Matrix Blocked By	ID
Estimation Technique	Residual PL
Degrees of Freedom Method	Containment

Class Level Information
Class	Levels	Values
ID	16380	not printed
outcomeClass	4	High Low Medium None/minimal
exposureC	4	2 = Intermediate 3 = High 4 = Very high 1 = Low

Number of Observations Read	58575
Number of Observations Used	58575

Response Profile
Ordered Value	outcomeClass	Total Frequency
1	High	4103
2	Low	23846
3	Medium	7283
4	None/minimal	23343
In modeling category probabilities, outcomeClass='None/minimal' serves as the reference category.

Dimensions
G-side Cov. Parameters	4
Columns in X	15
Columns in Z per Subject	4
Subjects (Blocks in V)	16380
Max Obs per Subject	9

Reading another thread about this error, the suggestion was to use PROC GEE instead of GLIMMIX so that the subject intercepts would not have to be estimated. I am not familiar with PROC GEE. Does this code look reasonable? The repeated measures are not at constant time intervals nor the same intervals for each subject nor the same number of measurements per subject: e.g. subject 1 could have 3 measurements at day 47, 496, and 10345, whereas subject 2 could have 4 measurements at 365, 849, 3495, and 9231.

proc gee data = cogFunc._07_Model descending;
    class ID outcomeClass exposureC (ref = '1 = Low');
    model outcomeClass = exposureC / dist = multinomial link = gLogit type3;
    repeated subject = ID;
run;

GEE cannot fit anything other than type = independent for the repeated statement, so I guess I'm stuck with that. It also tells me that it cannot calculate Type III statistics, so I don't get an overall test of the exposure factor. Are there any options that I should be using to get more information out of PROC GEE? Is this even a correct procedure to use in my case?

Thanks in advance for any help.

Warm regards,

Michael

StatDave · Posted 05-29-2020 11:11 AM

Another approach you could use is a non-modeling approach via the CMH option in PROC FREQ. This should provide an overall test of your exposure variable. For example:

proc freq;
table id*exposurec*outcomeclass / noprint cmh;
run;

You will want the NOPRINT option to avoid printing a table for every ID level. The results will show three tests. Since your response is ordinal, you will probably want to use one of the first two depending on the nature of your exposure variable. If it is ordinal, use the first statistic. If it is nominal, use the second. If your response is binary, the first and second will be the same.

Kastchei · Posted 05-29-2020 11:32 AM

Thanks, Stat Dave! I had forgotten about CMH. I will at some point need a working model, because exposure is just one of several variables that will be adjusted for (e.g. demographics, some medical conditions, count of medications taken). However, I certainly can use CMH when looking at each variable one at a time vs. the outcome.

MichaelL_SAS · Posted 05-29-2020 08:48 PM

One small comment, I believe PROC GEE does support Type III tests for generalized logit models using the Wald test statistic, and it supports Type III tests using either the generalized score statistic or the Wald test statistic for ordinal response models. To request the Wald tests you can specify the Wald option in the MODEL statement.

StatDave · Posted 05-30-2020 12:03 PM

To be clear, to get type3 tests you will need to specify both the TYPE3 and the WALD options in the MODEL statement.

Moving multinomial logistic from GLIMMIX to GEE

Re: Moving multinomial logistic from GLIMMIX to GEE

Re: Moving multinomial logistic from GLIMMIX to GEE

Re: Moving multinomial logistic from GLIMMIX to GEE

Re: Moving multinomial logistic from GLIMMIX to GEE