turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- PROC GLIMMIX NEGBIN RCBD- question regarding prese...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-28-2012 04:43 PM

Hello all,

With the help of SAS for Mixed Models, I am still slowly learning to interpret results generated by PROC GLIMMIX for a simple RCBD. I am trying to figure out how to correctly present my results. I am not sure whether I should be presenting transformed or original parameters. Because my data consists of counts my rationale for using PROC GLIMMIX was that it would allow me to specify a non-normal distribution and interpret, discuss and present parameters for untransformed data but now I am not so sure.

**Background:** RCBD, 1 treatment, 2 levels (vegetated walls and un-vegetated walls). 10 green walls, 10 blank walls (equivalent to 20 observational units). Vacuum sampled insects from within 10 quadrats (subsamples) on each green wall and 3 on each blank wall. Testing if green walls have significantly higher number of insects than blank walls during three separate months of sampling.

I am using the p-value presented in the Type III Test of Fixed Effects to determine whether the difference between the means of insects present in the two treatments was significant. It was, with p<0.0001

Traditionally, to present my results I would create a bar graph showing the means of the treatments, their standard errors, and letters indicating significant differences between the two means. I am not sure if this is appropriate anymore. What I think I should be presenting is the significant difference between the means of log transformed predicted values generated by a model based on my original data. Should I then be showing these means instead? If so, I assume I would use the values under the Estimates column generated by the LSMEANS statement. But do these values have “real world” meaning? I realize that I could use ILINK statement to back transform but these values are still quite different from means calculated from my original data.

In a nutshell I am wondering if it is appropriate to present the means and standard errors of my original data although the significant differences detected are based on these transformed expected parameters…

I hope this question makes sense. Thank you for any thoughts or suggestions.

Sincerely,

Serena

Code below is for sampling in August:

data abundancevisit3withsub;

input blk trt$ subsample y;

lines;

1 g 1 1

1 g 2 4

1 g 3 5

1 g 4 4

1 g 5 0

1 g 6 14

1 g 7 3

1 g 8 7

1 g 9 2

1 g 10 4

1 b 11 0

1 b 12 0

1 b 13 0

2 g 1 9

2 g 2 8

2 g 3 4

2 g 4 5

2 g 5 3

2 g 6 9

2 g 7 5

2 g 8 6

2 g 9 2

2 g 10 1

2 b 11 1

2 b 12 0

2 b 13 0

3 g 1 2

3 g 2 0

3 g 3 1

3 g 4 0

3 g 5 0

3 g 6 1

3 g 7 0

3 g 8 0

3 g 9 0

3 g 10 2

3 b 11 0

3 b 12 0

3 b 13 0

4 g 1 5

4 g 2 0

4 g 3 2

4 g 4 2

4 g 5 0

4 g 6 4

4 g 7 4

4 g 8 3

4 g 9 5

4 g 10 14

4 b 11 1

4 b 12 .

4 b 13 0

5 g 1 62

5 g 2 16

5 g 3 28

5 g 4 30

5 g 5 63

5 g 6 57

5 g 7 61

5 g 8 45

5 g 9 36

5 g 10 24

5 b 11 0

5 b 12 1

5 b 13 2

6 g 1 4

6 g 2 18

6 g 3 7

6 g 4 8

6 g 5 10

6 g 6 18

6 g 7 15

6 g 8 18

6 g 9 4

6 g 10 10

6 b 11 0

6 b 12 0

6 b 13 0

7 g 1 19

7 g 2 43

7 g 3 24

7 g 4 34

7 g 5 11

7 g 6 20

7 g 7 38

7 g 8 85

7 g 9 47

7 g 10 85

7 b 11 1

7 b 12 0

7 b 13 0

8 g 1 3

8 g 2 3

8 g 3 3

8 g 4 4

8 g 5 5

8 g 6 6

8 g 7 11

8 g 8 7

8 g 9 5

8 g 10 2

8 b 11 1

8 b 12 0

8 b 13 0

9 g 1 1

9 g 2 0

9 g 3 1

9 g 4 2

9 g 5 15

9 g 6 3

9 g 7 3

9 g 8 3

9 g 9 4

9 g 10 4

9 b 11 1

9 b 12 1

9 b 13 1

10 g 1 9

10 g 2 2

10 g 3 6

10 g 4 6

10 g 5 5

10 g 6 13

10 g 7 18

10 g 8 8

10 g 9 5

10 g 10 16

10 b 11 0

10 b 12 0

10 b 13 0;

proc print data=abundancevisit3withsub;

run;

proc glimmix data=abundancevisit3withsub method=quad;

class trt blk subsample;

model y = trt / solution dist=negbin link=log;

random int trt / sub=blk;

lsmeans trt/ilink;

ods select LSMeans Estimates;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-29-2012 10:12 AM

Hi Serena,

When you calculate means and standard errors of the original data, do you transform before calculating and then back transform? If not, the values are somewhat misleading to begin with. I truly believe that best estimators are those obtained from the analysis, rather than the raw data.

Given all of that, I would suggest the following version for your lsmeans statement:

lsmeans trt/ilink diff lines;

This will give the mean and standard error on the original scale (ILINK) and letters for the bar graph (LINES).

Good luck,

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-29-2012 11:13 AM

Thank you for these helpful comments Steve,

When calculating means and SE's of the original data, I am not transforming and then back transforming. Though these values are provided by the ILINK statement I hesitated using them b/c they were quite different from the means and SE's calculated from my raw data.

To make sure I understand: You think it is better to present the means and SE's generated by the analysis rather than parameters calculated from raw data and even transformed, then back-transformed data. Intuitively this is hard to grasp b/c rather than saying I sampled an overall average of 2 insects per m^2 on each blank wall and 25 on each green wall, I am saying I sampled an average of -1.45 and 1.92. I think I understand though, because the non- normal distribution of my data means that calcs of mean and SE are not truly showing the mean and SE of my data.

But I still don't see why the back transformed parameters would make any sense to present either. These seem doubly distant, they are neither parameters generated by the analysis nor parameters of the original data...

Thank you again for this advice!

Serena

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-29-2012 12:50 PM

Once upon a time, when mastodons roamed the earth, and I was a first year graduate student in a field that wasn't statistics, but used statistics a lot, I took a course that used Sokal and Rohlf's *Biometry* as a text. In it was an example of how to analyze count data, using the square root transformation. At the bottom of page 384, it said, "For reporting of means the transformed means are squared again and confidence limits are reported in lieu of standard errors."

So, I supposed that you had transformed the data using a log transform, calculated a mean, and then exponentiated that value to get the expected value of your counts, for the "raw" data.

It is important to get one thing out of that old book--Confidence limits are reported in lieu of standard errors. Backtransforming the estimate of the standared error obtained gets you into trouble. That's why lvm suggests reporting both. One of the great things about PROC GLIMMIX is its use of the delta method to get back-transformed standard errors. Still, I always expect asymmetric confidence bounds on non-normal data.

Good luck.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-29-2012 01:12 PM

This is all very interesting. Although I have taken two semesters of graduate level statistics the emphasis seems to have been placed entirely on assumptions of normality and wrestling your data via transformations to fit that distribution. My professor never enthusiastically supported transformations for achieving normality but couldn't necessarily equip us with the tools to launch us into this non-parametric world. This is all very new and I really appreciate the feedback...

Thanks,

Serena

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-29-2012 12:13 PM

As Steve has indicated, you definitely want to use and give the results from LSMEANS. You could actually give the results on the log-link scale (with SE), and the inverse link scale (and SE). The analysis is modeling the log of the expected (mean) value as a function of treatment (and random effects). The inverse link is thus giving you estimates of the expected values (what you want). If you are reading SAS for Mixed Models, 2nd edition, you will see that this is clearly the most appropriate approach for showing results (assuming the model is appropriate). If the model is not appropriate, then the basis for your entire analysis is faulty. If you can justify the use of the negative binomial model (with the random effects) to determine if treatment has a significant effect, then it is only logical to use the full set of results from using the model. The arithmetic averages are *not* better estimates of the expected values (they can be far worse estimates).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-29-2012 12:49 PM

Yes! I understand...

Thank you,

Serena