Crossed Random Effects in a Multinomial GLMM

blokeman · Posted 09-29-2020 11:29 PM

SAS newbie here.

My research problem involves an unordered quaternary outcome. There are two random effects which are crossed i.e. independent of each other. In addition to the usual MLEs for fixed effects and estimated SDs for random effects, I also need to extract the estimated random effects of each category of each of the two random effects because those estimates are of interest in themselves. My data file is attached to this message.

At the moment, I cannot get even a highly simplified model to fit:

PROC IMPORT OUT= MyLib.Mydata
DATAFILE= "E:\mydata.txt"
DBMS=TAB REPLACE;
GETNAMES=YES;
DATAROW=2;
RUN;

proc glimmix data = mylib.mydata method=Laplace inititer=9999;
class dep_var TrgPoS trigger CompLemma / ref = first;
model dep_var = TrgPoS / SOLUTION DIST=MULTINOMIAL LINK=GLOGIT;
RANDOM INTERCEPT / SUBJECT=trigger GROUP=dep_var SOLUTION;
RANDOM INTERCEPT / SUBJECT=CompLemma GROUP=dep_var SOLUTION;
run;

There are 20 more explanatory variables to add if/when this can be made to work. Presently, however, model fitting halts with the following error message:

The initial estimates did not yield a valid objective function.

This page suggested increasing the INITITER setting to try and solve this problem, that's why its value is set to 9999. But as you can see, it's not helping.

If I remove one of the random effects, then the model does converge (although with CompLemma as the only random effect it does complain about a G matrix not being positive definite). However, leaving a random effect out is not an option because both of them are of major importance to my research problem.

I'm running Windows 7, and my SAS version is 9.4.

SteveDenham · Posted 10-13-2020 08:13 AM

This is kind of "iffy" but may work. Try running without the group= option, and save the random parameters. Then re-run and insert these values as starting values in a PARMS statement. This should put the model/algorithm in a better starting place. Note that there are a lot of if's and should's in my reply. Other things in the paper you reference talk about separation in multinomial models, especially if one category is rare in comparison to the others. In this case, PROC FREQ can be useful to identify the level of the multinomial. If none of these approaches yield anything usable, then it is unlikely that you can reach exactly what you are looking for. You may be able to get something by collapsing/combining levels, but that might not be what you are looking for.

SteveDenham

blokeman · Posted 10-26-2020 11:43 AM

Thank you for weighing in, Steve, and apologies for the delay.

If I interpreted your message correctly, the first step of your approach would be to run an otherwise identical chunk of code as in my initial message, except with GROUP=dep_var omitted from the two random effect statements. When I do that, SAS complains thusly:

ERROR: Nominal models require that the response variable is a group effect on RANDOM
       statements. You need to add 'GROUP=dep_var'.

Am I doing something wrong?

SteveDenham · Posted 10-26-2020 12:46 PM

No you are not. Just as I replied to @PaigeMiller regarding dep_var in the CLASS statement, this is another quirk of nominal multinomial analysis in GLIMMIX. I don't see anything about it in the documentation (could be looking in the wrong place), but it surely makes sense. My suggestion was not useful, and I apologize.

So, now we come to the issue of the random variables CompLemma and trigger. What can you say about these? How many levels are there in each? I note that they are not nested in any manner, but that is about all I know. You said that if CompLemma is the only random effect specified that you get a message that the G matrix is not positive definite. That hints at something within that variable being wonky. Try your original code with a NOBOUND option in the PROC GLIMMIX statement. It may be one of those situations that in an OLS setting would lead to a "negative variance component". That means it is strongly correlated with trigger (negatively) or the residual error, which in turn could mean you have a level or levels that could be consolidated. Look at some PROC FREQ output for these variables. If you have levels with small N (and that is going to be subjective), consider consolidating those levels, provided that makes sense for your research questions. We just need more information about these variables, and why they need to be considered.

One final approach might be to run a model with CompLemma and trigger as fixed effects. Perhaps the solution vector would give some estimates that could be used as starting values in a PARMS statement. I don't know???

SteveDenham

blokeman · Posted 10-26-2020 03:59 PM

Thank you for the further input. I tried running my original code with NOBOUND added as the final argument of the PROC GLIMMIX statement, which returned the same message as before, i.e. "The initial estimates did not yield a valid objective function."

The variable trigger has 61 levels, while CompLemma has 345 levels. Both follow an unbalanced (Zipfian) distribution, i.e. a small number of high-frequency levels are responsible for the lion's share of the dataset while the rest contribute much less, with many levels having single-digit sample sizes. This is linguistic data, where such skewed distributions are unfortunately the norm.

Both random effects are lexical items occurring at key syntactic positions of the grammatical phenomenon under scrutiny. The model includes categorical fixed-effect parameters for shared characteristics of triggers and CompLemmas -- for example, there's a categorical fixed effect for whether the trigger is a verb, noun, adjective or conjunction, and there's a numeric fixed effect representing the number of syllables in CompLemma. In both cases, however, there's every reason to suspect that the lexical identity of the trigger/CompLemma may exert an effect over and above its fixed-effect characteristics.

I am averse to introduce dummy variables for individual levels of trigger or CompLemma because there are a number of research questions regarding these variables that ideally require all levels of both variables to be treated the same way rather than estimating some levels as random effects (undergoing shrinkage) and others as fixed effects (avoiding shrinkage). For example, one question of interest is the overall magnitude of inter-level variance with trigger and CompLemma, and this cannot be properly estimated unless each random effect includes all of its categories.

When you talk about 'consolidating' levels, do you mean collecting low-frequency levels within the random effects into trashcan categories? Indeed, some scholars in my field do follow such a practice. I was hoping to show them up by avoiding such an information-lossy procedure though. 🙂

SteveDenham · Posted 10-27-2020 07:53 AM

I really sympathize with your desire to "show up" the others in your field that consolidated low count categories. That would be great. However, the first thing that occurs to me is that they may have run into the same issue, and took that alternative as the only way to get an analytic solution.

I guess the only thing I could suggest at this point is to see if there is any way to turn this into an ordinal multinomial. Unfortunately I don't have enough knowledge in this field to even begin to suggest how that might be done. Tell us something about the independent variable TrgPos. Could the categories there be assigned some sort of ordinal value, based on some utility function? If so, then you could move from the glogit link and many of the issues you have seen to a clogit link. The random effects wouldn't change, but at least it would be something different.

SteveDenham

blokeman · Posted 10-27-2020 04:28 PM

Many thanks for the input again, Steve. Unfortunately, the four outcome categories do not seem to lend themselves to ordinal modeling. Even though I can think of one or two ways to order them which make sense from a theoretical perspective, the "latent utility function" contradicts such theories i.e. the effects are not monotonous in the resulting utility function.

The present situation is that I'm actually able to fit these models with both random effects, without conflating levels, using Bayesian software. The downside compared to something like GLIMMIX, however, is that A) model fitting is excruciatingly slow, and B) the simple and straightforward likelihood-based inference methods (for comparing nested models) are no longer available since the likelihood is entangled with the prior distributions. It takes about two days to fit all the models I need -- and that's before I even start comparing them.

So it appears that the choice I presently face is between enduring the slowness of the Bayesian route or resigning myself to conflating categories of low-frequency levels of the random effects with frequentist software.

SteveDenham · Posted 10-27-2020 06:41 PM

I vote Bayesian. Didn’t know that was an option. I’ll ask some things about it tomorrow.

SteveDenham

SteveDenham · Posted 10-28-2020 08:18 AM

I am going to assume that you are using the brms package in R. Good but takes a long time, especially with as many levels as you have. I suspect the start-up/burn-in stage suffers from the issue you are seeing with GLIMMIX. However, the Bayes approach uses the starting values supplied and generates new realizations at each iteration. That makes me think that maybe, just maybe, using PROC MCMC in SAS might work a bit faster. See if Fang Chen's SGF paper here https://support.sas.com/resources/papers/proceedings16/SAS5601-2016.pdf for more on this.

SteveDenham

blokeman · Posted 10-28-2020 03:59 PM

You really know your stuff, sir. Impressive job guessing not only the software but also the library (which is not the only option available for the software).

If I have to be Bayesian then I'll probably stick to the alternate software given that I'm already familiar with its ins and outs.

But I'll keep GLIMMIX with conflated random-effect levels in mind as a backup option.

Many thanks for all your input.

PaigeMiller · Posted 10-13-2020 08:16 AM

@blokeman wrote:
proc glimmix data = mylib.mydata method=Laplace inititer=9999;
class dep_var TrgPoS trigger CompLemma / ref = first;
model dep_var = TrgPoS / SOLUTION DIST=MULTINOMIAL LINK=GLOGIT;
RANDOM INTERCEPT / SUBJECT=trigger GROUP=dep_var SOLUTION;
RANDOM INTERCEPT / SUBJECT=CompLemma GROUP=dep_var SOLUTION;
run;
There are 20 more explanatory variables to add if/when this can be made to work. Presently, however, model fitting halts with the following error message:

The initial estimates did not yield a valid objective function.

I would remove dep_var from the CLASS statement, as it does not belong there. This may not help, but it might help!

--
Paige Miller

SteveDenham · Posted 10-13-2020 08:27 AM

Nowhere in the documentation for GLIMMIX is the secret that for that PROC (and so far as I can tell, only that PROC), for the multinomial distribution, the dependent variable MUST appear in the CLASS statement. You only find out that this is required if you remove it from the CLASS statement.

SteveDenham

Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

Re: Crossed Random Effects in a Multinomial GLMM

SAS Innovate 2025: Call for Content