turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- LSMean Standard Errors

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-23-2014 06:51 PM

I am aware that one cannot use overlapping confidence intervals to test whether two groups are different, because the standard error of the difference is not simply the sum of the standard errors of each group. However, in a balanced design with equal variances, I would expect the standard error of the difference to be roughly sum divided by root 2. That scaling factor is not very large, and so if there is an extreme amount of overlap in CIs, I would expect there to be no difference.

I'm running into a situation, granted with a fairly complicated model, where SAS is calculating the standard error of the difference to be an order of magnitude smaller than the simple scaled estimate. This results in highly overlapping confidence intervals (see graph and LSM table below), but a statistically significant difference. It's been tricky to explain to the scientists. I realize that once all the other data, random effects, and correlations are taken into account, that perhaps this is just a weird example. It seems so extreme though, that I thought I should lay it out here to see if it alarms anyone else. Thanks for any thoughts

**speed Least Squares Means**

speed | Estimate | Standard Error | DF | t Value | Pr > |t| | Alpha | Lower | Upper |

50 | 1.0319 | 0.03768 | 739 | 27.39 | <.0001 | 0.05 | 0.9579 | 1.1059 |
---|---|---|---|---|---|---|---|---|

100 | 1.0494 | 0.03768 | 739 | 27.85 | <.0001 | 0.05 | 0.9754 | 1.1233 |

simple scaled estimate = **0.05329**

**Differences of speed Least Squares Means**

speed | _speed | Estimate | Standard Error | DF | t Value | Pr > |t| | Alpha | Lower | Upper |

50 | 100 | -0.0175 | 0.00548 | 739 | -3.19 | 0.0015 | 0.05 | -0.0282 | -0.0067 |

Code: speed and maturity have two levels only, time has 7 (1, 2, 3, 4, 9, 17, 30)

proc gLiMMix data = a (where = (maturity ^= 364 and analyte = 'EE')) plots = all initGLM initIter = 1000000 itDetails chol;

by analyte;

class batch ring maturity speed time_;

model measurement = speed maturity|time_ / dist = lognormal;

random batch / type = vc solution cl;

random time_ / residual subject = ring(batch) type = sp(pow)(time) solution cl;

nLOptions maxIter = 1000000 gConv = 0 fConv = 0;

lSMeans speed / cl diff plots = meanplot(join);

run;

Accepted Solutions

Solution

10-27-2014
02:30 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-27-2014 02:30 PM

This fully explains your results. You have a high covariance between the two means. The SE of the difference is:

sqrt(.001502 + .001502 - 2*.001487) = .00548.

The large block variance is at least partly responsible.

Because of the large variability between batches, the precision of an individual mean averaged across blocks is low (high SE), but the difference between means has high precision. This is exactly why one includes a batch variance in a model.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-23-2014 07:11 PM

I think I may have figured out the issue, but if anyone could review. If I remove the random batch, the standard error estimates for each speed drop from 0.03768 to 0.007389, while the standard error of the difference only changes from 0.005476 to 0.009552, and the rough estimate is 0.01045. There is still overlap in the CIs, but it seems much more reasonable.

I'm not sure why *removing* batch is decrasing LSM variance. I would think that accounting for a source of variation, the batch, would result in less variation to attribute to speed, not more.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-24-2014 01:30 PM

I have seen this several times, and almost always with repeated measures. Try adding a Kenward-Rogers adjustment to the degrees of freedom (KR2 is preferable, but any would be better than none). It would be DDFM=KR2 in the model statement.

I think the cause is inability to fit a repeated by random effect without a lot of data. There is no time_ by batch variance component, separate from the G side sp(pow) estimate that also includes ring. I think we all could learn something if could chime in on this--I could be way off base.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-24-2014 01:39 PM

The SE of a difference is more complex than you think when variables are correlated. By definition, with a random block effect and a nonzero block variance, your means are correlated. The variance of a difference of two means is, in general:

var(mu1-m2) = var(mu1) + var(mu2) - 2*cov(mu1,mu2)

SE(mu1-mu2) is just the square root of this, and var(mu1), etc., are the squares of the individual SEs. You probably have a very large block variance. Ignoring the repeated measures, the covariance of any two randomly selected observations in the same block have a covariance equal to the block variance. Taking out the block variance is moving some of the total variability into the indiviudal mean SEs. You don't want this because it gives an incorrect measure of the uncertainty of the mean estimates (not taking design into account). Put in covb and corrb as options in the model statement to see the var-covariance matrix of the parameter estimates.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-24-2014 04:24 PM

Thanks guys. I added the kr2, covb, and corrb options. Attached is the output for Covb and Corrb and below are the covariance estimates.

Covariance Parameter Estimates | |||

Cov Parm | Subject | Estimate | Standard Error |

batch | 0.004205 | 0.004228 | |

SP(POW) | ring(batch) | 0.7292 | 0.02646 |

Residual | 0.003196 | 0.000207 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-26-2014 05:10 PM

Put in the NOINT option on the model statement and rerun. It is easier to see the variances/covariances of the speed means directly in the covb matrix without the intercept (speed needs to stay as the first term in the model).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-27-2014 02:22 PM

Sure thing. Rather than drop the whole table again, here's just the speed part. Let me know if you need the whole thing.

CovB | speed 50 | speed 100 |
---|---|---|

speed 50 | 0.001502 | 0.001487 |

speed 100 | 0.001487 | 0.001502 |

CorrB | speed 50 | speed 100 |
---|---|---|

speed 50 | 1.0000 | 0.9901 |

speed 100 | 0.9901 | 1.0000 |

Solution

10-27-2014
02:30 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-27-2014 02:30 PM

This fully explains your results. You have a high covariance between the two means. The SE of the difference is:

sqrt(.001502 + .001502 - 2*.001487) = .00548.

The large block variance is at least partly responsible.

Because of the large variability between batches, the precision of an individual mean averaged across blocks is low (high SE), but the difference between means has high precision. This is exactly why one includes a batch variance in a model.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-27-2014 02:35 PM

Thank you so much for your help! It turned out to a pretty simple explanation. I guess since I had a complicated model, I was looking for too complicated an answer.