I have a longitudinal, three-wave, panel (repeated measures within subjects) data set with a binary outcome Y. In a proc glimmix model (SAS 9.4) with one level-1 predictor (time = 0, 1, 2) and a random intercept, the predicted probability of Y at time = 0, based on the intercept (exp (b0) / (1 + exp (b0))), using a likelihood method of estimation (quad or laplace), is very different than the probability of Y, from the raw data (predicted probability at time = 0 from model = .18, probability from raw data = .25). What’s surprising is that when I use simple logistic regression (ignoring the clustering within subjects) or the same multilevel model in proc glimmix with a pseudo-likelihood method of estimation (like MSPL), the predicted probability of Y at time = 0 based on the intercept is much closer to the probability of Y, from the raw data.
Estimates of the fixed effect of time (the odds ratio) are similar across all modeling methods and consistent with the change in probability of Y in the raw data across waves.
Any thoughts on why? Any suggested tweaks in the glimmix setup with likelihood estimation that might make the predicted probability of Y at time = 0 closer to the raw probability?
Thanks in advance for your thoughts!
It might be just what the likelihood estimation methods give you in this case. Not sure if there is anything you can do about it, unless there are numeric issues with these methods for your data/model. If you need confirmation, please send in your data and program.
Thanks,
Jill
The issue is most likely that the generalized linear mixed model is not exactly to the linear mixed model what the non-mixed GLM is to the LM.
When using mixed models interest is most often on the marginal (population-averaged) estimates and not the conditional (cluster/subject-specific) ones. Your references to "the [overall] probability in the raw data" lead me to believe that such is the case here as well.
In a linear mixed model, and for certain link functions, these interpretations align. For a logit-linked GLMM that is no longer true: the integral (~average) over a nonlinear link isn't the same as holding the random effect(s) -- which are normal and thereby symmetric in the link scale -- at zero. Such GLMM will give you conditional but not marginal estimates. Because your "average cluster" may or may not be very reflective of the overall average, such discrepancies are common if you try to interpret the parameters as if they are marginal (random effects integrated out rather than held at the "average cluster").
I've written up some more background on this issue with more references e.g. here. In SAS no out-of-the-box solution that I know of exists unfortunately: you'll have to do the (numerical) integration by hand. A key reference here is Hedeker et al. (2017). Perhaps a more straightforward approach here would be to use GEE, which results in marginal (and only marginal) estimates & can handle non-independence as well.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.