I have a conundrum that I just can't figure out...(epidemiologist and not a bio/statistician)
I have a continuous outcome with a range of 0-5.0. This data was generated from a questionnaire from a Likert-like item as a score (i.e. 0=no event, 1=event+severity level 1, 2=event+severity level 2, etc). The data is right-skewed.
I want to model the data using <proc glimmix> with gamma log-link (where my distribution is exponential, r=1 -- such that my gamma density specializes to the exponential density). This assumes a constant coefficient of variation over the range of X, a random variable.
From my understanding, GLMM with gamma should be able to model a gamma function with limits (0,infinity). So why does proc glimmix restrict to interval data? I can't model this data as a count process (i.e. Poisson, etc), so that is not an alternative option. I could use proc genmod, but intuitively, a GLMM is more appropriate than GEE models for my given research question. Or am I just modeling the data as an exponential with log-link since r=1? In which case, can I then model my 0's? Alternatively, I could add an interval to each observation (e.g. +1) to avoid the issue of zeros in the data, but I don't know how this would impact my beta estimates (if at all).
Thanks!
Ouch. It looks like a mixture analysis problem, and there aren't any real good mixed model approaches to this yet, unless you are an excellent coder, and are willing to write a fair amount of PROC NLMIXED code. Basically, there is a probability of an event (binomial distribution), and a severity score in the case of an event, which is assumed to follow a gamma distribution. I am guessing that you want to go the GLMM route because of clustering or hierarchical structure of some sort. I think the closest you could come is to use GLIMMIX and fit a multinomial distribution with something like a cumulative log-log link (might be complementary cumulative log-log). Probably not going to give you the parameter estimates needed for epidemiological work, though.
Steve Denham
Both the gamma and exponential are defined only for positive real numbers. 0 is not allowed for either distribution. If you don't have random effects you could use PROC FMM for a mixture model, i.e., for a mixture of Prob(Y=0) and Prob(Y>0), the latter being an exponential. With random effects, you could do a mixture with NLMIXED; see examples in Stroup textbook on generalized linear mixed models. However, I am not convinced you want a gamma distribution. It appears that your response variable has an upper limit of 5. The gamma/exponential has is unbounded on the right. Multinomial may be appropriate, as suggested by Steve, if you have counts of several individuals for all the treatments or covariates. Response could be rescaled to 0-1 by dividing by the max (5). Then you have a beta distribution. But then you still have the undefined Prob(Y=0).
Adding a constant (c) to Y technically allows you to use an exponential distribution. But Y+c will not have the same distributional properties as Y. Consider the exponential. It is defined by a parameter b, the mean; the variance is, by definition, b^2=mean^2. For Y+c, the mean is now b+c, but the variance is unchanged. So, it is impossible for mean^2 to equal the variance. Use of the gamma can handle this.
Thank you both. You have given me much to ponder over the next few days. I will look in to both suggestions of multinomial distribution as well adding a constant (with considerations about the distribution as described). Your input has been much appreciated. When I figure out what approach to take and the outcome(s) I estimate, I'll re-post for those who may have a similar problem.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.