turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- PROC GLIMMIX POISSON RCBD question regarding param...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2012 04:39 PM

Hello all,

I am an environmental science graduate student attempting to use PROC GLIMMIX per the suggestion of my statistics professor. I have a few general questions about using this model due to my pretty underdeveloped stats understanding.

Study Background: I vacuum sampled insects off of 10 walls covered in vines and 10 adjacent blank walls during three separate months last summer. Study mimics an RBCD. Single treatment factor has 2 levels-green and not green. Each site is treated as a block containing both a blank and a green wall. Insect abundance data from the walls follow a non normal distribution and lack equality of variance. I could solve this with a simple log transformation but was hoping to avoid losing the ability to discuss my original values. Thus the PROC GLIMMIX. An example of my code is shown below involving data from one month of sampling (repeated measures were not used in this study).

**data**abundancevisit3;

inputblk trt$ y;

lines;

**1**g **5.867**

**1**b **0**

**2**g **6.933**

**2**b **0.444**

**3**g **0.8**

**3**b **0**

**4**g **5.2**

**4**b **0.667**

**5**g **56.267**

**5**b **1.333**

**6**g **14.933**

**6**b **0**

**7**g **54.133**

**7**b **0.444**

**8**g **5.026**

**8**b **0**

**9**g **4.8**

**9**b **1.333**

**10**g **11.733**

**10**b **0**

;

optionsnocenter;

**procprint** data=abundancevisit3;

**run**;

**procglimmix** data=abundancevisit3;

classtrt blk;

modely=trt/dist=poisson link=log;

randomblk;

lsmeanstrt/cl ilink;

**run**;

Questions

From output it looks as though my data, or at least my parameters, are being transformed. Does this limit my ability to talk about my original dataset? Or is my data left untransformed and only the parameters transformed? Can I use backtransformed means when discussing results though it is not the same as my original data means? I guess I just want a simplistic idea of what SAS is doing to my data vs parameters. What is being transformed exactly?

My other question involves assessment of the POISSON model fit. I have seen some codes using the pearson chi square/DF to determine whether their poisson distribution is over dispersed but my output only contains gen chai square/DF. I gather I could change this by inserting a MODEL= statement but have no real idea of what I would put there…any tips on what I need to examine to make sure this model is an OK fit for the analysis? Thank you so much in advance for any tips!!!

Serena

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2012 04:50 PM

Did you sample a constant wall surface or did you have to standardize by the sampled area (hence the decimals)?

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2012 09:41 PM

As PGStats stated, the Poisson distribution is for counts. But your "counts" are decimal values. You must have converted the raw values to some unit scale (volume, etc.). One can do the analysis on these decimal values, but first you want to make sure you really are analyzing counts.

In the procedure code you supplied, you took the fairly common approach (and generally recommended approach) of modeling the log(ExpectedCount) as a linear function of the treatment factor. That is, you told the program to use the log link (which is actually the default for the Poisson). Note that in generalized linear (mixed) models, one models the log of the expected value, not the expected value of a log of the variable (as one would do with transformed Y and normal distributions). The statistical results are all on the log-link scale. That is appropriate. But you also gave the inverse link option; this means that you are also getting the mean counts and confidence intervals on the original scale (just to the right of the log scale results in the LSMEANS output table).

The default model fitting approach in GLIMMIX with random effects is known as pseudo-likelihood or pseudo-restricted-likelihood. This prevents you from easily evaluating goodness of fit. (I can't get into the technical details here, but in short, the model is being approximated, which prevents direct interpretation of the likelihood obtained). There are some work-arounds by saving certain results and doing some post-model-fitting analyses. For your model, however, there is a better and more direct approach: use a model fitting method that works without approximating the model. I suggest you use the Quadrature method (method=quad). You must write the random statement in a different way (but equivalent in terms of meaning and interpretation. That is, use

random int / sub=blk;

instead of

random blk;

.

Full code:

proc glimmix data=abundancevisit3 method=quad;

class trt blk;

model y = trt / dist=poisson link=log;

random int / sub=blk;

lsmeans trt / cl ilink;

run;

Now you get a proper Pearson chi-square statistic and df (and the ratio) to help in evaluating the model fit. For your example, the ratio is under 1; thus, no evidence (from this statistic) of a poor fit. The model is still log(Exp.Count) as a function of treatment. The ilink option gives the means and other statistics on the original scale.

Good luck.

LVM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-16-2012 12:18 PM

Every so often I read a response and think, "Wow! What a great answer!" This is one of those times. Thanks, lvm!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-16-2012 01:21 PM

My feeling, exactly, Rick. LVM, you're a true expert. Thanks.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-19-2012 01:10 PM

Hello,

I would like to get back at the question of the nature of the dependent variable, the *counts*. Scaled counts are not counts : they do not follow a Poisson distribution. I think it is preferable to use real counts and to model the sampling unit size (time, area, volume, whatever) explicitly with an offset. For example, in SMATT1's data, if *y = count / area* then LVM's analysis would become :

proc glimmix data=abundancevisit3 method=quad;

class trt blk;

**LogArea = log(area);**

model **count **= trt / dist=poisson link=log **offset=LogArea**;

random int / sub=blk;

lsmeans trt / cl ilink;

run;

I simulated different counts corresponding to a few areas that would result in SMATT1's data. The assessment of a reasonable Poisson fit (ChiSquare/DF) holds only to about an area of 5 (i.e. to *counts = 5y*). It thus seems important to use the true scale of the counts to assess the fit. The same holds for estimating the significance of effects.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-19-2012 01:13 PM

Good points made by PGStats. Use of offset is a good practice.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-19-2012 10:34 AM

Thanks for the compliments. However, I just consider myself a serious user of linear, generalized, and nonlinear mixed models.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-19-2012 07:02 PM

Thank you for these helpful comments!!

IVM: In summary Iwant to make sure I understand. “Modeling the log(ExpectedCount) as a linearfunction of the treatment factor” means that based on my data, glimmix generatesexpected values on a log scale. It thenuses these to determine if the treatment significantly explains differences in expectedOR actual counts OR both? I guess I did not realize that GLIMMIXgenerates expected data.

PG STATS: data containsdecimals because data was normalized by sample size. Ten 0.75m^2 quadrats were sampled at eachwall (10 subsamples). I divided theaverage abundance of insects per quadrat by 0.75m^2 to obtain average per m^2. From my understanding it seems like the 0.75m^2 value falls within that up to count=5y limit you discussed? I am a little hesitant to include the offsetmainly b/c I am not totally clear on what that is doing. If I use the non-normalized data why includearea at all? I am guessing the offsetsomehow normalizes for sample area within the model. Area is log’d b/c expected counts arepresented in log scale. Or something!

Thank you all, so informativeand helpful!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-19-2012 09:25 PM

You are correct Serena, if your sample size was always the same, there is no need to include an offset in your analysis. Simply use raw counts instead of densities. Assess the fit of your three months observations that way. Maybe LVM can advise you on the robustness of the analysis to departures from the Poisson distribution. GLIMMIX offers you the negative binomial as an alternative distribution for counts.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2012 08:37 AM

Regarding the general question on the model....Let mu equal the (conditional) expected count for a given value of treatment (trt; fixed effect) and block (blk; random effect). The combination of your model and random statements (with the options) is really specifying the following model.

ln(mu) = constant + trt + blk,

with mu = exp(ln(mu)),

y ~ Poisson(mu).

So yes, GLIMMIX is modeling the link of the expected value for the specified distribution. The LSMEANS statement is giving you ln(mu) for each trt level; because of the ilink option, you are also getting mu = exp(ln(mu)) for each treatment level. Based on the Pearson chi-square for your data, Poisson seems reasonable.

Based on your most recent post, you may not be getting the maximum use of your data. You indicated 10 samples for each wall (block) and treatment. You could actually be specifying all 10 observations for each trt and blk combination, using a label of 1-10. That is, each record would have a trt, blk, and **sample **code, together with y (data set with 10 times the number of records). A model could then be fit of the form:

proc glimmix data=abundancevisit3 method=quad;

class trt blk **sample**;

model y = trt / dist=poisson link=log;

random int **sample **/ sub=blk;

lsmeans trt / cl ilink;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2012 09:46 AM

But if my experimental unit is a wall, then the 10 quadrat samples taken on the wall are just subsamples of the EU and must be averaged...right? Otherwise aren't I introducing some sort of false replication?

Thanks again,

S

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2012 09:58 AM

The new code I gave you **is **intended to properly treat the samples as true samples within the experimental units (not as indepdenent replicates). However, I wrote my code too quickly. I recommend:

proc glimmix data=abundancevisit3 method=quad;

class trt blk **sample**;

model y = trt / dist=poisson link=log;

random int **trt **/ sub=blk;

lsmeans trt / cl ilink;

run;

This will handle the samples as samples, if you take this approach. Or, just deal with the simpler situation dealt with originally.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2012 10:11 AM

Thanks again

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2012 11:38 AM

Last questions on this topic-I promise!

If I include subsamples does it matter that I have unequal replication of subsamples for the two different treatments? Blank walls did not require as many quadrats as vegetated thus there were 3 on the blank and 10 on the vegetated...(although I still had equal replication of experiemntal units). I am not sure if this would require modifications to the code at all.

And, IVM mentioned that Pearson Chi/DF less than 1 shows adequate fit of poisson- is there a lower limit of some sort? I wasn't sure if lack of fit was shown by departures from 1 or actually being greater than 1. Data from my second sampling visit yeilds a value of 0.17-not sure if this is too low.

Serena