01-03-2017 03:17 PM
I am testing if tree density differs in response to a treatment using Proc Glimmix with a negative binomial distribution. I would like to use a covariate of tree density prior to treatment in the model. Are there any diagnostics I should be using to test if my covariate should be transformed?
01-03-2017 07:25 PM
The generalised linear mixed model framework of proc glimmix fundamentally encompasses covariates that would not meet otherwise necessary gaussian distributional assumptions in an ordinary least squares model framework.
This is not the same however as asserting that a transformation on a covariate will not improve overall model fit.
If this interests you, you could still usefully explore covariate transformations in the spirit of the latter, and using the proximity of the scale factor (chi sq/df) to unity, and/or with method=laplace you could compare the AICCs or -2LLs of models with competing covatiate transformations and/or also higher order polynomial terms to see if there is a minimisation on one or more particular transformations.
01-03-2017 09:42 PM
Stepping back a bit: Is "tree density" a count (i.e., does "tree density" take integer values)? Typically, density is a rate: count/area. Is area the same for all experimental/observational units? I'm pondering here whether the negative binomial distribution is your best choice.
If tree density is a count (and now for clarity, I'll refer to "tree density" as "tree count"), then a negative binomial distribution might be appropriate. The canonical link for the negative binomial distribution is the log. Unless you invoke a curvilinear (e.g., quadratic) model, the relationship between the response and the covariate is assumed to be linear in the GLIMMIX procedure (generalized linear mixed model), which roughly translates into a linear relationship between log(post-tree-count) and the covariate (noting of course, that a log transformation of post-tree-count is not the same as a log link). Logically, I then would consider a log transformation of pre-tree-count. Plot log(post-tree-count) against log(pre-tree-count) and visually assess the relationship. Does it look linear? You can color- or symbol-code points by other predictor variable values as well for (potentially) additional insight.
01-04-2017 09:01 AM
Thanks for your responses. I will try various transformations of the covariate to see if it improves overall model fit and to assess the visual relationship between the covariate and the y variable in the model.
I used the NB distribution because the Poisson distribution had major overdisperssion problems. I had also tried log transforming the y variable and using a gaussian model, but that did not fix the normality problems.
01-04-2017 09:23 AM
The y variable was density (trees per hectare), taken on fixed radius plots of the same size. Data were collected as counts, then transformed to density using the plot size.
01-04-2017 01:45 PM
Theoretically, density data cannot follow a negative binomial distribution because density data do not take integer values.
Unlike (at least some) packages in R, the GLIMMIX procedure accepts non-integer data in conjunction with the negative binomial distribution and appears to deliver valid results.
That said, my preference for modeling density based on counts is to use an offset (which is the third model in the code below). This approach is particularly advantageous if, say, plot areas are variable.
dpost: post-treatment density
dpre: pre-treatment density
npost: post-treatment count
npre: pre-treatment count
_log: log transformation
title1 "Using density with negative binomial distribution"; proc glimmix data=have method=laplace; nloptions tech=nrridg; class trt; model dpost = dpre_log trt dpre_log*trt / dist=negbin; lsmeans trt / ilink; output out=glmm_out pred=pred student=student; run; proc sgplot data=glmm_out; loess x=dpre_log y=student; refline 0 / axis=y; run; title1 "Using count with negative binomial distribution"; proc glimmix data=have method=laplace; nloptions tech=nrridg; class trt; model npost = npre_log trt npre_log*trt / dist=negbin; lsmeans trt / ilink; output out=glmm_out pred=pred student=student; run; proc sgplot data=glmm_out; loess x=npre_log y=student; refline 0 / axis=y; run; title1 "Using count with negative binomial distribution with offset to obtain density"; proc glimmix data=have method=laplace; nloptions tech=nrridg; class trt; model npost = npre_log trt npre_log*trt / dist=negbin offset=area_log; lsmeans trt / ilink; output out=glmm_out pred=pred student=student; run; proc sgplot data=glmm_out; loess x=npre_log y=student; refline 0 / axis=y; run;
01-13-2017 03:15 PM
Stacy, you might find this exchange interesting. @PGStats built a nice example.
01-04-2017 01:27 PM
I'll ask, because it's a not-uncommon misconception: Are you checking normality using the observed data combined over treatment groups or using the residuals from a model? The former approach is bad, the latter approach is good. Distributional assumptions are conditional on the predictor variables.
Something of an aside: If the number of trees is "big enough", the Poisson distribution can be approximated by a normal distribution. Even if the data within each treatment group are normally distributed, the treatment groups could have different variances. Unequal variances might be accommodated with a log transformation or a heterogeneous variances model, as well as other approaches. So a model invoking a normal distribution might work, or it might not, depending on the data. If the counts are substantially overdispersed, then a model based on normal distributions might be less likely to be satisfactory.
01-09-2017 02:21 PM
My understanding is that normality should be checked with residuals from the model.
I had initially tried a Gaussian distribution, but normality was a major problem that was not overcome with transformation. A Poisson distribution had major overdispersion problems.