BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kastchei
Pyrite | Level 9

I am aware that one cannot use overlapping confidence intervals to test whether two groups are different, because the standard error of the difference is not simply the sum of the standard errors of each group.  However, in a balanced design with equal variances, I would expect the standard error of the difference to be roughly sum divided by root 2.  That scaling factor is not very large, and so if there is an extreme amount of overlap in CIs, I would expect there to be no difference.

I'm running into a situation, granted with a fairly complicated model, where SAS is calculating the standard error of the difference to be an order of magnitude smaller than the simple scaled estimate.  This results in highly overlapping confidence intervals (see graph and LSM table below), but a statistically significant difference.  It's been tricky to explain to the scientists.  I realize that once all the other data, random effects, and correlations are taken into account, that perhaps this is just a weird example.  It seems so extreme though, that I thought I should lay it out here to see if it alarms anyone else.  Thanks for any thoughts Smiley Happy

speed Least Squares Means

speedEstimateStandard ErrorDFt ValuePr > |t|AlphaLowerUpper
501.03190.0376873927.39<.00010.050.95791.1059
1001.04940.0376873927.85<.00010.050.97541.1233

simple scaled estimate = 0.05329

Differences of speed Least Squares Means

speed_speedEstimateStandard ErrorDFt ValuePr > |t|AlphaLowerUpper
50100-0.01750.00548739-3.190.00150.05-0.0282-0.0067

Code: speed and maturity have two levels only, time has 7 (1, 2, 3, 4, 9, 17, 30)

proc gLiMMix data = a (where = (maturity ^= 364 and analyte = 'EE')) plots = all initGLM initIter = 1000000 itDetails chol;

    by analyte;

    class batch ring maturity speed time_;

    model measurement = speed maturity|time_ / dist = lognormal;

    random batch                             /                                type = vc            solution cl;

    random time_                             / residual subject = ring(batch) type = sp(pow)(time) solution cl;

    nLOptions maxIter = 1000000 gConv = 0 fConv = 0;

    lSMeans speed / cl diff plots = meanplot(join);

run;


LS Means.png
1 ACCEPTED SOLUTION

Accepted Solutions
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

This fully explains your results. You have a high covariance between the two means. The SE of the difference is:

sqrt(.001502 + .001502 - 2*.001487) = .00548.

The large block variance is at least partly responsible.

Because of the large variability between batches, the precision of an individual mean averaged across blocks is low (high SE), but the difference between means has high precision. This is exactly why one includes a batch variance in a model.

View solution in original post

8 REPLIES 8
Kastchei
Pyrite | Level 9

I think I may have figured out the issue, but if anyone could review.  If I remove the random batch, the standard error estimates for each speed drop from 0.03768 to 0.007389, while the standard error of the difference only changes from 0.005476 to 0.009552, and the rough estimate is 0.01045.  There is still overlap in the CIs, but it seems much more reasonable.

I'm not sure why removing batch is decrasing LSM variance.  I would think that accounting for a source of variation, the batch, would result in less variation to attribute to speed, not more.

SteveDenham
Jade | Level 19

I have seen this several times, and almost always with repeated measures.  Try adding a Kenward-Rogers adjustment to the degrees of freedom (KR2 is preferable, but any would be better than none).  It would be DDFM=KR2 in the model statement.

I think the cause is inability to fit a repeated by random effect without a lot of data.  There is no time_ by batch variance component, separate from the G side sp(pow) estimate that also includes ring.  I think we all could learn something if could chime in on this--I could be way off base.

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

The SE of a difference is more complex than you think when variables are correlated. By definition, with a random block effect and a nonzero block variance, your means are correlated. The variance of a difference of two means is, in general:

var(mu1-m2) = var(mu1) + var(mu2) - 2*cov(mu1,mu2)

SE(mu1-mu2) is just the square root of this, and var(mu1), etc., are the squares of the individual SEs. You probably have a very large block variance. Ignoring the repeated measures, the covariance of any two randomly selected observations in the same block have a covariance equal to the block variance. Taking out the block variance is moving some of the total variability into the indiviudal mean SEs. You don't want this because it gives an incorrect measure of the uncertainty of the mean estimates (not taking design into account). Put in covb and corrb as options in the model statement to see the var-covariance matrix of the parameter estimates.

Kastchei
Pyrite | Level 9

Thanks guys.  I added the kr2, covb, and corrb options.  Attached is the output for Covb and Corrb and below are the covariance estimates.

Covariance Parameter Estimates
Cov ParmSubjectEstimateStandard Error
batch0.0042050.004228
SP(POW)ring(batch)0.72920.02646
Residual0.0031960.000207
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Put in the NOINT option on the model statement and rerun. It is easier to see the variances/covariances of the speed means directly in the covb matrix without the intercept (speed needs to stay as the first term in the model).

Kastchei
Pyrite | Level 9

Sure thing.  Rather than drop the whole table again, here's just the speed part.  Let me know if you need the whole thing.

CovBspeed 50speed 100
speed 500.0015020.001487
speed 1000.0014870.001502

CorrBspeed 50speed 100
speed 501.00000.9901
speed 1000.99011.0000
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

This fully explains your results. You have a high covariance between the two means. The SE of the difference is:

sqrt(.001502 + .001502 - 2*.001487) = .00548.

The large block variance is at least partly responsible.

Because of the large variability between batches, the precision of an individual mean averaged across blocks is low (high SE), but the difference between means has high precision. This is exactly why one includes a batch variance in a model.

Kastchei
Pyrite | Level 9

Thank you so much for your help!  It turned out to a pretty simple explanation.  I guess since I had a complicated model, I was looking for too complicated an answer.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2760 views
  • 7 likes
  • 3 in conversation