My latest opinion: R side stuff should only be used for fitting models with "separable" errors, such as Gaussian/normal. If the dependent variable has a distribution where the mean and variance are functionally related (Poisson, negative binomial, gamma), the repeated nature should be modeled as a G side matrix. Do you guys have any thoughts on this?
Steve Denham
There are numerous issues involved in Steve’s original message. Some of this is philosophical and some is technical. If you are modeling R-side (co-)variation with a GLMM, you may be performing a strictly quasi-likelihood analysis, whether you realize it or not. For instance, if the conditional distribution (i.e., conditional on the G-side random effects) does not have a free scale parameter (binomial and Poisson, for instance), then any R-side modeling is incorporating a multiplicative scale parameter that would not be there for these distributions. As stated on page 128 of Stroup (2012), “No actual probability distribution exists with this structure, but in many cases it adequately models the distribution of the count [or proportion] data, and the quasi-likelihood is perfectly well defined for ... estimation and inference purposes.” So, this is one way of getting at the usually intractable marginal distribution. If you don’t have random (G-side) effects, you are getting GEE analysis with this situation, which is used very successfully for GLMs without other random effects.
We “see” marginal distributions, that is, observations are from marginal distributions. But, it can be argued that observations are generated from conditional distributions, that is, a conditional model comes closer to capturing the data-generating mechanism. This is certainly a take-home message in Stroup’s book (although I am sure that I am greatly oversimplifying a much bigger topic—sorry). This theme is found throughout his book. In the marginal-vs.-conditional debate, it is often overlooked that the two kinds of models are targeting different parameters; Stroup makes a compelling argument that the typical investigator is more interested in the targeted parameter from the conditional model (such as the conditional binomial probability). I basically agree with this, but I am sure this can be debated.
The more I learn about GLMMs, the more I am leaning to the conditional-model approach to analysis. However, there can be important uses for marginal models, so I am not going to get into any major on-line debates about this. However, in terms of repeated measures, I have a difficult time conceptualizing what an autoregressive (or other structure) means for the multiplicative scale parameter (say, with overdispersion for a “binomial” distribution). I can conceptualize this with a random effect in a conditional model.
For exponential-family distributions with a free-scale parameter (e.g., gamma, negative binomial, and other two-parameter conditional distributions), R-side analysis (with RANDOM _RESIDUAL_ / …) makes sense as a true likelihood analysis (not quasi-likelihood). But one must be careful in fitting a model. This is technical issue with the analysis. For instance, a RANDOM _RESIDUAL_; statement here would create another multiplicative scale parameter, so that the overall scaling would be the product of two constants; there would be no unique estimates for the two scale terms (a form of overparameterization). However, statements like RANDOM _RESIDUAL_ / group=TRT; would be useful to indicate that there is separate scale parameter for each treatment (etc.). When you get into repeated measures analysis for the gamma and negative binomial, things can get very messy. If you specify, for instance, an AR(1) structure for R-side analysis, you are defining a working correlation matrix. As stated by Stroup (page 435), “it is not clear how the working correlation parameters co-exist with the scale parameters intrinsic to the [conditional] distribution… The area in need of further development is clearly the two-parameter [non-normal] exponential family.” My view is that a lot is unknown about R-side analysis for two-parameter non-normal distributions—good research opportunities for statisticians.
Why? PROC GLIMMIX parameterizes both the G-side and the R-side using the same statement, RANDOM, with the exception that the R-side requires either the keyword, _RESIDUAL_, or the option, RESIDUAL. Of the distributions that PROC GLIMMIX can estimate, the normal distribution, the lognormal distribution, and the t-distribution have "separable" errors as you have defined them, where the mean and the variance are NOT functionally related; in all other distributions, the variance depends on the mean. Since you can have either
a G-side matrix, an R-side matrix, or both, it is unclear to me why you would want to restrict some of the distributions to one "side" and others to the other "side".
The variance of the observations, given the random effects, equals A**(0.5)*R*Tr(A)**(0.5), where A is a diagonal matrix of the variances of the different distributions. For the distributions with "separable" errors, A is the identity matrix. By default, the R matrix is the scaled identity matrix, which for distributions with "separable" errors has a scale value = 1.00. So, unless you specify a different variance-covariance structure for this R-side matrix, the distributions with "separable" errors (except for the t-distribution) will have variances that are the identity matrix.
Good points, Matt. My thought is that the use of the R side structures for the binomial, poisson, negative binomial and gamma distributions leads to marginal estimation models, rather than models conditional on the random effects. Walt Stroup gives an excellent example of how this makes a difference for binomial responses in his latest book. The estimates in the conditional model fit the mode and are the unbiased estimates of unobserved proportions, whereas the estimates in the marginal model fit the mean, but are biased (towards 0.5) estimates of unobserved proportions.
Most of my repeated measures these days are counts, but looking at the various distributions, I get worried (a lot) about which kind of model (conditional vs. marginal) I am going to end up fitting. The great thing about "normal" variables is that the two models yield identical results. But not for those "non-separable" distributions.
Steve Denham
I haven't read Dr. Stroup's latest book, but I wouldn't necessarily characterize results from marginal models as "biased" relative to results from conditional models. These different models provide answers to different questions: Marginal models provide overall averages for the population ("population-averaged"), while conditional models provide "subject-specific" averages for specific random groups within a population. For the "nonseparable" distributions, the average of all the conditional model averages usually does not equal the marginal model average. However, this latter inequality is not what I'd define as a "bias" in a statistical sense. If you specify the type of model most pertinent to answering your questions, then you should not need to make the distinction by type of distribution between G-side and R-side effects in PROC GLIMMIX. [See the following reference, which, among others, better describes the distinction between conditional models and marginal models: Zeger SL, Liang K-Y, Albert PS. Models for longitudinal data: A generalized estimating equation approach. Biometrics 1988 Dec;44(4):1049-1060.]
As a great example of the problem, look at the cover picture of Walt's book on Amazon--it hit me between the eyes. It has the GEE target and the GLMM (Y|b) targets shown with frequency plotted against the proportion.
Input from the SAS developers would be really nice here, especially Schabenberger and Wolfinger. I realize this may not be the place, but they are the architects behind GLIMMIX, and I would love to know their thoughts.
Steve Denham
Most of this conversation is out of my expertise, but I think 1zmm's assessment helps me. When I attended a seminar (JSM, 2008) by Young and Littell, my notes say the following:
•Marginal Model
◦Focus is inference about the population
◦It is a generalized linear model with R-side random effects
•Conditional Model
◦Focus is on inferences about individuals
◦It is a generalized linear mixed model with G-side and R-side random effects
◦G-side effects operate INSIDE the link function
Then I wrote the following, which has always stuck with me:
"Marginal distribution always exists. Does joint distribution with specified correlation structure and marginal distribution really exist? Models for nonnormal data with G-side effects might be vacuous!"
Wow. That quote stands almost diametrically opposed to Stroup's statements.(section 3.5.7, pp. 109 ff.)
"As a final comment on the marginal GLMM, recall that one of our defining criteria for a statistical model says that, ideally, it should describe a plausible mechanism giving rise to the observed data. By this standard, marginal GLMMs fail." (my emphasis added)
He goes on: "Once we define the model in terms of the working covariance, we no longer have a true probability distribution. Instead, we have a quasi-likelihood, which we will define formally in Chapter 4. There is no known probability mechanism that could give rise to data as described by the marginal model. Among the models we have considered here, the conditional GLMM is the only one that actually describes a process that is even possible. Any legitimate alternative would still have a two-step process--vary the binomial probability according to some distribution and then generate binomial observations given the binomial probability conditional on that location. Like it or not, the generating mechanism describes a conditional model. The marginal GLMM does not describe a process: it simply allows marginal means to be estimated when they are deemed to be the objective." (Again, my emphasis, and a tip of the hat to Matt's points above).
This dichotomy has been weighing on me ever since I came across it.
Steve Denham
Well, if it is opposite then there is a strong probability that I (a novice) misquoted the seminar speakers (the experts) and accidentally reversed the point that they were making!
Steve, I will email the relevant pages of Young&Littell's handouts to your yahoo address so that you can read what THEY wrote. I think that is better than my interpretation of what they might have said.
There are numerous issues involved in Steve’s original message. Some of this is philosophical and some is technical. If you are modeling R-side (co-)variation with a GLMM, you may be performing a strictly quasi-likelihood analysis, whether you realize it or not. For instance, if the conditional distribution (i.e., conditional on the G-side random effects) does not have a free scale parameter (binomial and Poisson, for instance), then any R-side modeling is incorporating a multiplicative scale parameter that would not be there for these distributions. As stated on page 128 of Stroup (2012), “No actual probability distribution exists with this structure, but in many cases it adequately models the distribution of the count [or proportion] data, and the quasi-likelihood is perfectly well defined for ... estimation and inference purposes.” So, this is one way of getting at the usually intractable marginal distribution. If you don’t have random (G-side) effects, you are getting GEE analysis with this situation, which is used very successfully for GLMs without other random effects.
We “see” marginal distributions, that is, observations are from marginal distributions. But, it can be argued that observations are generated from conditional distributions, that is, a conditional model comes closer to capturing the data-generating mechanism. This is certainly a take-home message in Stroup’s book (although I am sure that I am greatly oversimplifying a much bigger topic—sorry). This theme is found throughout his book. In the marginal-vs.-conditional debate, it is often overlooked that the two kinds of models are targeting different parameters; Stroup makes a compelling argument that the typical investigator is more interested in the targeted parameter from the conditional model (such as the conditional binomial probability). I basically agree with this, but I am sure this can be debated.
The more I learn about GLMMs, the more I am leaning to the conditional-model approach to analysis. However, there can be important uses for marginal models, so I am not going to get into any major on-line debates about this. However, in terms of repeated measures, I have a difficult time conceptualizing what an autoregressive (or other structure) means for the multiplicative scale parameter (say, with overdispersion for a “binomial” distribution). I can conceptualize this with a random effect in a conditional model.
For exponential-family distributions with a free-scale parameter (e.g., gamma, negative binomial, and other two-parameter conditional distributions), R-side analysis (with RANDOM _RESIDUAL_ / …) makes sense as a true likelihood analysis (not quasi-likelihood). But one must be careful in fitting a model. This is technical issue with the analysis. For instance, a RANDOM _RESIDUAL_; statement here would create another multiplicative scale parameter, so that the overall scaling would be the product of two constants; there would be no unique estimates for the two scale terms (a form of overparameterization). However, statements like RANDOM _RESIDUAL_ / group=TRT; would be useful to indicate that there is separate scale parameter for each treatment (etc.). When you get into repeated measures analysis for the gamma and negative binomial, things can get very messy. If you specify, for instance, an AR(1) structure for R-side analysis, you are defining a working correlation matrix. As stated by Stroup (page 435), “it is not clear how the working correlation parameters co-exist with the scale parameters intrinsic to the [conditional] distribution… The area in need of further development is clearly the two-parameter [non-normal] exponential family.” My view is that a lot is unknown about R-side analysis for two-parameter non-normal distributions—good research opportunities for statisticians.
While I was writing my (overly long) reply, several more postings were made. Some of my comments duplicate comments by Steve.
While a good research opportunity, I am faced with the dilemma of fitting gamma and negative binomial distributed data, and trying to answer colleague's questions about R side vs. G side. I WANT MY CAKE AND I WANT TO EAT IT TOO. :smileycool: For now, I get about a 2 to 1 vibe for a conditional analysis (G side), mostly because I get fewer convergence problems and more reasonable location and scale estimates. It is really hard to "unlearn" the repeated approach MIXED has allowed us to use.
Steve Denham
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.