Solved: Re: LSMean Standard Errors

Kastchei · Posted 10-23-2014 06:51 PM

I am aware that one cannot use overlapping confidence intervals to test whether two groups are different, because the standard error of the difference is not simply the sum of the standard errors of each group. However, in a balanced design with equal variances, I would expect the standard error of the difference to be roughly sum divided by root 2. That scaling factor is not very large, and so if there is an extreme amount of overlap in CIs, I would expect there to be no difference.

I'm running into a situation, granted with a fairly complicated model, where SAS is calculating the standard error of the difference to be an order of magnitude smaller than the simple scaled estimate. This results in highly overlapping confidence intervals (see graph and LSM table below), but a statistically significant difference. It's been tricky to explain to the scientists. I realize that once all the other data, random effects, and correlations are taken into account, that perhaps this is just a weird example. It seems so extreme though, that I thought I should lay it out here to see if it alarms anyone else. Thanks for any thoughts

speed Least Squares Means

speed	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper
50	1.0319	0.03768	739	27.39	<.0001	0.05	0.9579	1.1059
100	1.0494	0.03768	739	27.85	<.0001	0.05	0.9754	1.1233

simple scaled estimate = 0.05329

Differences of speed Least Squares Means

speed	_speed	Estimate	Standard Error	DF	t Value	Pr > \|t\|	Alpha	Lower	Upper
50	100	-0.0175	0.00548	739	-3.19	0.0015	0.05	-0.0282	-0.0067

Code: speed and maturity have two levels only, time has 7 (1, 2, 3, 4, 9, 17, 30)

proc gLiMMix data = a (where = (maturity ^= 364 and analyte = 'EE')) plots = all initGLM initIter = 1000000 itDetails chol;

by analyte;

class batch ring maturity speed time_;

model measurement = speed maturity|time_ / dist = lognormal;

random batch / type = vc solution cl;

random time_ / residual subject = ring(batch) type = sp(pow)(time) solution cl;

nLOptions maxIter = 1000000 gConv = 0 fConv = 0;

lSMeans speed / cl diff plots = meanplot(join);

run;

lvm · Posted 10-27-2014 02:30 PM

This fully explains your results. You have a high covariance between the two means. The SE of the difference is:

sqrt(.001502 + .001502 - 2*.001487) = .00548.

The large block variance is at least partly responsible.

Because of the large variability between batches, the precision of an individual mean averaged across blocks is low (high SE), but the difference between means has high precision. This is exactly why one includes a batch variance in a model.

View solution in original post

Kastchei · Posted 10-23-2014 07:11 PM

I think I may have figured out the issue, but if anyone could review. If I remove the random batch, the standard error estimates for each speed drop from 0.03768 to 0.007389, while the standard error of the difference only changes from 0.005476 to 0.009552, and the rough estimate is 0.01045. There is still overlap in the CIs, but it seems much more reasonable.

I'm not sure why removing batch is decrasing LSM variance. I would think that accounting for a source of variation, the batch, would result in less variation to attribute to speed, not more.

SteveDenham · Posted 10-24-2014 01:30 PM

I have seen this several times, and almost always with repeated measures. Try adding a Kenward-Rogers adjustment to the degrees of freedom (KR2 is preferable, but any would be better than none). It would be DDFM=KR2 in the model statement.

I think the cause is inability to fit a repeated by random effect without a lot of data. There is no time_ by batch variance component, separate from the G side sp(pow) estimate that also includes ring. I think we all could learn something if could chime in on this--I could be way off base.

Steve Denham

lvm · Posted 10-24-2014 01:39 PM

The SE of a difference is more complex than you think when variables are correlated. By definition, with a random block effect and a nonzero block variance, your means are correlated. The variance of a difference of two means is, in general:

var(mu1-m2) = var(mu1) + var(mu2) - 2*cov(mu1,mu2)

SE(mu1-mu2) is just the square root of this, and var(mu1), etc., are the squares of the individual SEs. You probably have a very large block variance. Ignoring the repeated measures, the covariance of any two randomly selected observations in the same block have a covariance equal to the block variance. Taking out the block variance is moving some of the total variability into the indiviudal mean SEs. You don't want this because it gives an incorrect measure of the uncertainty of the mean estimates (not taking design into account). Put in covb and corrb as options in the model statement to see the var-covariance matrix of the parameter estimates.

Kastchei · Posted 10-24-2014 04:24 PM

Thanks guys. I added the kr2, covb, and corrb options. Attached is the output for Covb and Corrb and below are the covariance estimates.

Covariance Parameter Estimates
Cov Parm	Subject	Estimate	Standard Error
batch		0.004205	0.004228
SP(POW)	ring(batch)	0.7292	0.02646
Residual		0.003196	0.000207

lvm · Posted 10-26-2014 05:10 PM

Put in the NOINT option on the model statement and rerun. It is easier to see the variances/covariances of the speed means directly in the covb matrix without the intercept (speed needs to stay as the first term in the model).

Kastchei · Posted 10-27-2014 02:22 PM

Sure thing. Rather than drop the whole table again, here's just the speed part. Let me know if you need the whole thing.

CovB	speed 50	speed 100
speed 50	0.001502	0.001487
speed 100	0.001487	0.001502

CorrB	speed 50	speed 100
speed 50	1.0000	0.9901
speed 100	0.9901	1.0000

lvm · Posted 10-27-2014 02:30 PM

This fully explains your results. You have a high covariance between the two means. The SE of the difference is:

sqrt(.001502 + .001502 - 2*.001487) = .00548.

The large block variance is at least partly responsible.

Because of the large variability between batches, the precision of an individual mean averaged across blocks is low (high SE), but the difference between means has high precision. This is exactly why one includes a batch variance in a model.

Kastchei · Posted 10-27-2014 02:35 PM

Thank you so much for your help! It turned out to a pretty simple explanation. I guess since I had a complicated model, I was looking for too complicated an answer.

SAS Innovate 2025: Save the Date