turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Difference btw LSMEANS gives contradictory result?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-14-2011 03:18 PM

I am running a 2-level hierarchical linear model in PROC MIXED, using the LSMEANS statement w/ PDIFF option & Tukey-Kramer adjustment for multiple comparisons to look at post hoc differences btw 2 of 6 possible "Activity Category" groups ("Category 2" vs. "Category 4"). In estimating the LSMEANS for these categories, I note that their 95% confidence limits overlap a great deal. This was taken as an indication that there is no statistically significant difference between them.

Code:

proc mixed data=data noclprint method=reml covtest;

class id ActivityCategory [other main effects];

model outcome = ActivityCategory [other main effects & 2-way interactions] /solution ddfm=bw outp=scores;

random intercept /subject=id type=un;

lsmeans ActivityCategory [other main effects & 2-way interactions] /pdiff adjust=tukey cl;

run;

LSMEANS Table (relevant parts, cleaned up via DATA step):

Activity

Obs factor Effect CategoryEstimateStdErr DF tValue Probt AlphaLower Upper1 2 ActivityCategory 2

0.17340.03846 3213 4.51 <.0001 0.050.09799 0.2488

2 2 ActivityCategory 40.080530.03685 3213 2.19 0.0289 0.050.008287 0.1528

However, when looking at the SAS-calculated test of their disparity generated via /PDIFF, the adjusted *p*-value for the difference is wildly significant (< .0001)! I am utterly confused by this result, which I think must have something to do with how SAS is calculating the *SE* for the difference test (not sure why the *SE* is smaller than the *SE* for either individual LSMEAN estimate). I get why the *df* is the same for the two sets of results (SAS default), though I'm not sure exactly why this is appropriate. I also understand that the confidence limits in the LSMEANS table for levels of the main effect are *not* adjusted for multiple comparisons. But, wouldn't they would only get *more* broad if adjusted? Note: my matrix algebra is far from up to par.

Activity _Activity

Effect Category Category

EstimateStdErr DF tValue Probt AdjustmentAdjpAlpha Lower Upper AdjLower AdjUpperActivityCategory 2 4

0.092870.01823 3213 5.09 <.0001 Tukey-Kramer<.00010.05 0.05711 0.1286 0.04087 0.1449

The above seems to raise the obvious question(s): Which is it? Are the two groups substantially disparate in their LSMEANS, or not? I'm specifically looking for guidance on which result is appropriate to report since these analyses are part of a larger study we plan to publish. My inclination is to be conservative and report the LSMEANS with their confidence limits (as suggested by both colleagues and here by Paige).

FYI, I have tested this model and checked the analagous results under a variety of schemes for calculating denominator degrees of freedom (e.g. CONTAIN, SATTERTH, KR) and various MC adjustments, all of which exhibit the same apparent contradiction as above.

Thanks in advance for any insights!

Accepted Solutions

Solution

10-14-2011
08:18 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-14-2011 08:18 PM

Not necessarily a contradiction. It is well known that individual confidence intervals can overlap (somewhat) when two means are significantly different, although many others are misled by the individual confidence intervals. This is true even with uncorrelated means (i.e., uncorrelated estimated expected values). Individual confidence intervals can have even more overlap when the means are correlated. By the nature of your model (with a random effect), there is the potential that the means are highly correlated. You didn't show all the output (such as the random effect variance), but I am guessing it is large (relatively speaking),which means there is high correlation in the means. For instance, if you square the SEs for each individual mean and for the difference, you have variances. V(1) is the square of the SE for mean 1, and so on. The standard formula for the variance of a difference (V(D)) is:

V(D) = V(1) + V(2) - 2*COV(1,2)

where COV(1,2) is the covariance of means 1 and 2. Using your results, you have:

.01823^2 = .03846^2 + .03685^2 - 2*COV(1,2)

.0003323 = .0014792 + .0013579 -2*COV(1,2)

The covariance is not shown here. However, a little algebra shows that it would have to be 0.001252.

You can use this to estimate the correlation of the two means. This would be:

0.001252/(0.03846*0.03685) = 0.88 .

Thus, there is a high a correlation in your case.

Bottom line, assuming you did other things correctly (can't tell from what you have given us), a consequence of your model fitted to the data is that the least squares means are highly correlated. Because of this, the standard error of a difference (SED) will be considerably smaller than the SED you would have with uncorrelated means. If the means were uncorrelated, COV(1,2)=0, and the SED would be:

sqrt(.0014792 + .0013579) = 0.053.

Your SED of 0.01823 is a lot smaller.

I would definitely use the results of the differences of the LSMEANS. This is the test result that properly accounts for the correlation of the means (and the model fitted to the data). In fact, this is one example (assuming you did other things correctly) of the great advantage of using mixed models: you gain a great deal in precision in the comparison of LSMEANS.

All Replies

Solution

10-14-2011
08:18 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-14-2011 08:18 PM

Not necessarily a contradiction. It is well known that individual confidence intervals can overlap (somewhat) when two means are significantly different, although many others are misled by the individual confidence intervals. This is true even with uncorrelated means (i.e., uncorrelated estimated expected values). Individual confidence intervals can have even more overlap when the means are correlated. By the nature of your model (with a random effect), there is the potential that the means are highly correlated. You didn't show all the output (such as the random effect variance), but I am guessing it is large (relatively speaking),which means there is high correlation in the means. For instance, if you square the SEs for each individual mean and for the difference, you have variances. V(1) is the square of the SE for mean 1, and so on. The standard formula for the variance of a difference (V(D)) is:

V(D) = V(1) + V(2) - 2*COV(1,2)

where COV(1,2) is the covariance of means 1 and 2. Using your results, you have:

.01823^2 = .03846^2 + .03685^2 - 2*COV(1,2)

.0003323 = .0014792 + .0013579 -2*COV(1,2)

The covariance is not shown here. However, a little algebra shows that it would have to be 0.001252.

You can use this to estimate the correlation of the two means. This would be:

0.001252/(0.03846*0.03685) = 0.88 .

Thus, there is a high a correlation in your case.

Bottom line, assuming you did other things correctly (can't tell from what you have given us), a consequence of your model fitted to the data is that the least squares means are highly correlated. Because of this, the standard error of a difference (SED) will be considerably smaller than the SED you would have with uncorrelated means. If the means were uncorrelated, COV(1,2)=0, and the SED would be:

sqrt(.0014792 + .0013579) = 0.053.

Your SED of 0.01823 is a lot smaller.

I would definitely use the results of the differences of the LSMEANS. This is the test result that properly accounts for the correlation of the means (and the model fitted to the data). In fact, this is one example (assuming you did other things correctly) of the great advantage of using mixed models: you gain a great deal in precision in the comparison of LSMEANS.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-14-2011 08:57 PM

Wow - I could not have asked for a clearer explanation! Thanks for including all the detail and helping to clear this up for me. I will definitely use the formal difference test in reporting our results.

Also, sorry for not including the random effect & residual estimates. They're pasted below FYI.

Thanks again!

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr > Z

UN(1,1) id 0.4786 0.02503 19.12 <.0001

Residual 0.4628 0.004994 92.68 <.0001