BookmarkSubscribeRSS Feed
SMATT1
Calcite | Level 5

Hello all,

With the help of SAS for Mixed Models, I am still slowly learning to interpret results generated by PROC GLIMMIX for a simple RCBD.  I am trying to figure out how to correctly present my results.  I am not sure whether I should be presenting transformed or original parameters.  Because my data consists of counts my rationale for using PROC GLIMMIX was that it would allow me to specify a non-normal distribution and interpret, discuss and present parameters for untransformed data but now I am not so sure.


Background: RCBD, 1 treatment, 2 levels (vegetated walls and un-vegetated walls).  10 green walls, 10 blank walls (equivalent to 20 observational units). Vacuum sampled insects from within 10 quadrats (subsamples) on each green wall and 3 on each blank wall.  Testing if green walls have significantly higher number of insects than blank walls during three separate months of sampling.

I am using the p-value presented in the Type III Test of Fixed Effects to determine whether the difference between the means of insects present in the two treatments was significant.  It was, with p<0.0001

Traditionally, to present my results I would create a bar graph showing the means of the treatments, their standard errors, and letters indicating significant differences between the two means.  I am not sure if this is appropriate anymore.  What I think I should be presenting is the significant difference between the means of log transformed predicted values generated by a model based on my original data.  Should I then be showing these means instead?  If so, I assume I would  use the values under the Estimates column generated by the LSMEANS statement. But do these values have “real world” meaning?  I realize that I could use ILINK statement to back transform but these values are still quite different from means calculated from my original data. 

In a nutshell I am wondering if it is appropriate to present the means and standard errors of my original data although the significant differences detected are based on these transformed expected parameters…

I hope this question makes sense.  Thank you for any thoughts or suggestions.

Sincerely,

Serena  

Code below is for sampling in August:

data abundancevisit3withsub;

input blk trt$ subsample y;

lines;

1 g 1 1

1 g 2 4

1 g 3 5

1 g 4 4

1 g 5 0

1 g 6 14

1 g 7 3

1 g 8 7

1 g 9 2

1 g 10 4

1 b 11 0

1 b 12 0

1 b 13 0

2 g 1 9

2 g 2 8

2 g 3 4

2 g 4 5

2 g 5 3

2 g 6 9

2 g 7 5

2 g 8 6

2 g 9 2

2 g 10 1

2 b 11 1

2 b 12 0

2 b 13 0

3 g 1 2

3 g 2 0

3 g 3 1

3 g 4 0

3 g 5 0

3 g 6 1

3 g 7 0

3 g 8 0

3 g 9 0

3 g 10 2

3 b 11 0

3 b 12 0

3 b 13 0

4 g 1 5

4 g 2 0

4 g 3 2

4 g 4 2

4 g 5 0

4 g 6 4

4 g 7 4

4 g 8 3

4 g 9 5

4 g 10 14

4 b 11 1

4 b 12 .

4 b 13 0

5 g 1 62

5 g 2 16

5 g 3 28

5 g 4 30

5 g 5 63

5 g 6 57

5 g 7 61

5 g 8 45

5 g 9 36

5 g 10 24

5 b 11 0

5 b 12 1

5 b 13 2

6 g 1 4

6 g 2 18

6 g 3 7

6 g 4 8

6 g 5 10

6 g 6 18

6 g 7 15

6 g 8 18

6 g 9 4

6 g 10 10

6 b 11 0

6 b 12 0

6 b 13 0

7 g 1 19

7 g 2 43

7 g 3 24

7 g 4 34

7 g 5 11

7 g 6 20

7 g 7 38

7 g 8 85

7 g 9 47

7 g 10 85

7 b 11 1

7 b 12 0

7 b 13 0

8 g 1 3

8 g 2 3

8 g 3 3

8 g 4 4

8 g 5 5

8 g 6 6

8 g 7 11

8 g 8 7

8 g 9 5

8 g 10 2

8 b 11 1

8 b 12 0

8 b 13 0

9 g 1 1

9 g 2 0

9 g 3 1

9 g 4 2

9 g 5 15

9 g 6 3

9 g 7 3

9 g 8 3

9 g 9 4

9 g 10 4

9 b 11 1

9 b 12 1

9 b 13 1

10 g 1 9

10 g 2 2

10 g 3 6

10 g 4 6

10 g 5 5

10 g 6 13

10 g 7 18

10 g 8 8

10 g 9 5

10 g 10 16

10 b 11 0

10 b 12 0

10 b 13 0;

proc print data=abundancevisit3withsub;

run;

proc glimmix data=abundancevisit3withsub method=quad;

class trt blk subsample;

model y = trt / solution dist=negbin link=log;

random int trt / sub=blk;

lsmeans trt/ilink;

ods select LSMeans Estimates;

run;

6 REPLIES 6
SteveDenham
Jade | Level 19

Hi Serena,

When you calculate means and standard errors of the original data, do you transform before calculating and then back transform?  If not, the values are somewhat misleading to begin with.  I truly believe that best estimators are those obtained from the analysis, rather than the raw data.

Given all of that, I would suggest the following version for your lsmeans statement:

lsmeans trt/ilink diff lines;

This will give the mean and standard error on the original scale (ILINK) and letters for the bar graph (LINES).

Good luck,

Steve Denham

SMATT1
Calcite | Level 5

Thank you for these helpful comments Steve,

When calculating means and SE's of the original data, I am not transforming and then back transforming.  Though these values are provided by the ILINK statement  I hesitated using them b/c they were quite different from the means and SE's calculated from  my raw data.

To make sure I understand: You think it is better to present the means and SE's generated by the analysis rather than parameters calculated from raw data and even transformed, then back-transformed data.  Intuitively this is hard to grasp b/c rather than saying I sampled an overall average of 2 insects per m^2 on each blank wall and 25 on each green wall, I am saying I sampled an average of -1.45 and 1.92.  I think I understand though, because the non- normal distribution of my data means that calcs of mean and SE are not truly showing the mean and SE of my data.

But I still don't see why the back transformed parameters would make any sense to present either.  These seem doubly distant, they are neither parameters generated by the analysis nor parameters of the original data...

Thank you again for this advice!

Serena

SteveDenham
Jade | Level 19

Once upon a time, when mastodons roamed the earth, and I was a first year graduate student in a field that wasn't statistics, but used statistics a lot, I took a course that used Sokal and Rohlf's Biometry as a text.  In it was an example of how to analyze count data, using the square root transformation.  At the bottom of page 384, it said, "For reporting of means the transformed means are squared again and confidence limits are reported in lieu of standard errors."

So, I supposed that you had transformed the data using a log transform, calculated a mean, and then exponentiated that value to get the expected value of your counts, for the "raw" data.

It is important to get one thing out of that old book--Confidence limits are reported in lieu of standard errors.  Backtransforming the estimate of the standared error obtained gets you into trouble.  That's why lvm suggests reporting both.  One of the great things about PROC GLIMMIX is its use of the delta method to get back-transformed standard errors.  Still, I always expect asymmetric confidence bounds on non-normal data.

Good luck.

Steve Denham

SMATT1
Calcite | Level 5

This is all very interesting.  Although I have taken two semesters of graduate level statistics the emphasis seems to have been placed entirely on assumptions of normality and wrestling your data via transformations to fit that distribution. My professor never enthusiastically supported transformations for achieving normality but couldn't necessarily equip us with the tools to launch us into this non-parametric world. This is all very new and I really appreciate the feedback...

Thanks,

Serena

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

As Steve has indicated, you definitely want to use and give the results from LSMEANS. You could actually give the results on the log-link scale (with SE), and the inverse link scale (and SE). The analysis is modeling the log of the expected (mean) value as a function of treatment (and random effects). The inverse link is thus giving you estimates of the expected values (what you want). If you are reading SAS for Mixed Models, 2nd edition, you will see that this is clearly the most appropriate approach for showing results (assuming the model is appropriate). If the model is not appropriate, then the basis for your entire analysis is faulty. If you can justify the use of the negative binomial model (with the random effects) to determine if treatment has a significant effect, then it is only logical to use the full set of results from using the model. The arithmetic averages are not better estimates of the expected values (they can be far worse estimates).

SMATT1
Calcite | Level 5

Yes! I understand...

Thank you,

Serena

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2735 views
  • 10 likes
  • 3 in conversation