Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Using a covariate in Proc Glimmix

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-03-2017 03:17 PM

I am testing if tree density differs in response to a treatment using Proc Glimmix with a negative binomial distribution. I would like to use a covariate of tree density prior to treatment in the model. Are there any diagnostics I should be using to test if my covariate should be transformed?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-03-2017 07:25 PM

The generalised linear mixed model framework of proc glimmix fundamentally encompasses covariates that would not meet otherwise necessary gaussian distributional assumptions in an ordinary least squares model framework.

This is not the same however as asserting that a transformation on a covariate will not improve overall model fit.

If this interests you, you could still usefully explore covariate transformations in the spirit of the latter, and using the proximity of the scale factor (chi sq/df) to unity, and/or with method=laplace you could compare the AICCs or -2LLs of models with competing covatiate transformations and/or also higher order polynomial terms to see if there is a minimisation on one or more particular transformations.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-03-2017 09:42 PM

Stepping back a bit: Is "tree density" a count (i.e., does "tree density" take integer values)? Typically, density is a rate: count/area. Is area the same for all experimental/observational units? I'm pondering here whether the negative binomial distribution is your best choice.

If tree density is a count (and now for clarity, I'll refer to "tree density" as "tree count"), then a negative binomial distribution might be appropriate. The canonical link for the negative binomial distribution is the log. Unless you invoke a curvilinear (e.g., quadratic) model, the relationship between the response and the covariate is assumed to be linear in the GLIMMIX procedure (generalized *linear* mixed model), which roughly translates into a linear relationship between log(post-tree-count) and the covariate (noting of course, that a log transformation of post-tree-count is not the same as a log link). Logically, I then would consider a log transformation of pre-tree-count. Plot log(post-tree-count) against log(pre-tree-count) and visually assess the relationship. Does it look linear? You can color- or symbol-code points by other predictor variable values as well for (potentially) additional insight.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-04-2017 09:01 AM

Thanks for your responses. I will try various transformations of the covariate to see if it improves overall model fit and to assess the visual relationship between the covariate and the y variable in the model.

I used the NB distribution because the Poisson distribution had major overdisperssion problems. I had also tried log transforming the y variable and using a gaussian model, but that did not fix the normality problems.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-04-2017 09:23 AM

The y variable was density (trees per hectare), taken on fixed radius plots of the same size. Data were collected as counts, then transformed to density using the plot size.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-04-2017 01:45 PM

Theoretically, density data cannot follow a negative binomial distribution because density data do not take integer values.

Unlike (at least some) packages in R, the GLIMMIX procedure accepts non-integer data in conjunction with the negative binomial distribution and appears to deliver valid results.

That said, my preference for modeling density based on counts is to use an offset (which is the third model in the code below). This approach is particularly advantageous if, say, plot areas are variable.

dpost: post-treatment density

dpre: pre-treatment density

npost: post-treatment count

npre: pre-treatment count

_log: log transformation

title1 "Using density with negative binomial distribution"; proc glimmix data=have method=laplace; nloptions tech=nrridg; class trt; model dpost = dpre_log trt dpre_log*trt / dist=negbin; lsmeans trt / ilink; output out=glmm_out pred=pred student=student; run; proc sgplot data=glmm_out; loess x=dpre_log y=student; refline 0 / axis=y; run; title1 "Using count with negative binomial distribution"; proc glimmix data=have method=laplace; nloptions tech=nrridg; class trt; model npost = npre_log trt npre_log*trt / dist=negbin; lsmeans trt / ilink; output out=glmm_out pred=pred student=student; run; proc sgplot data=glmm_out; loess x=npre_log y=student; refline 0 / axis=y; run; title1 "Using count with negative binomial distribution with offset to obtain density"; proc glimmix data=have method=laplace; nloptions tech=nrridg; class trt; model npost = npre_log trt npre_log*trt / dist=negbin offset=area_log; lsmeans trt / ilink; output out=glmm_out pred=pred student=student; run; proc sgplot data=glmm_out; loess x=npre_log y=student; refline 0 / axis=y; run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-09-2017 02:25 PM

Thanks for your help with this. The plots were all the same size, so density and counts should yield similar results.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 03:15 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-04-2017 01:27 PM

I'll ask, because it's a not-uncommon misconception: Are you checking normality using the observed data combined over treatment groups or using the residuals from a model? The former approach is bad, the latter approach is good. Distributional assumptions are conditional on the predictor variables.

Something of an aside: If the number of trees is "big enough", the Poisson distribution can be approximated by a normal distribution. Even if the data within each treatment group are normally distributed, the treatment groups could have different variances. Unequal variances might be accommodated with a log transformation or a heterogeneous variances model, as well as other approaches. So a model invoking a normal distribution might work, or it might not, depending on the data. If the counts are substantially overdispersed, then a model based on normal distributions might be less likely to be satisfactory.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-09-2017 02:21 PM

My understanding is that normality should be checked with residuals from the model.

I had initially tried a Gaussian distribution, but normality was a major problem that was not overcome with transformation. A Poisson distribution had major overdispersion problems.

Thank you.