This is my first post here. Thanks in advance.
I try to get result of differences of least squares means by using PROC MIXED. The code is like this:
proc mixed data=dummy;
class A B C;
model Y = A B C D A*C A*D;
lsmeans A /pdiff cl;
estimate 'A2 vs A1' A -1 1 0 0;
estimate 'A2 vs A3' A 0 1 -1 0;
estimate 'A2 vs A4' A 0 1 0 -1;
estimate 'A3 vs A1' A -1 0 1 0;
estimate 'A4 vs A1' A -1 0 0 1;
estimate 'A2 vs A3' A 0 1 -1 0;
run;
Usually, the result of lsmeans statment and estimate statment should be consistent. But this time they are not:
Estimates
Label Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
A2 vs A1 -55.1659 46.7075 375 -1.18 0.2383 0.05 -147.01 36.6755
A2 vs A3 19.0351 32.1103 375 0.59 0.5537 0.05 -44.1038 82.1739
A2 vs A4 -7.3886 35.7314 375 -0.21 0.8363 0.05 -77.6476 62.8705
A3 vs A1 -74.2009 45.4017 375 -1.63 0.1030 0.05 -163.47 15.0730
A4 vs A1 -47.7773 47.9488 375 -1.00 0.3197 0.05 -142.06 46.5050
A2 vs A3 19.0351 32.1103 375 0.59 0.5537 0.05 -44.1038 82.1739
A1 30.4191 3.5455 375 8.58 <.0001 0.05 23.4475 37.3907
A2 23.5044 2.0461 375 11.49 <.0001 0.05 19.4812 27.5275
A3 26.0330 2.0424 375 12.75 <.0001 0.05 22.0169 30.0491
A4 25.6443 2.0548 375 12.48 <.0001 0.05 21.6039 29.6847
A 1 2 6.9148 4.0733 375 1.70 0.0904 0.05 -1.0946 14.9241
A 1 3 4.3861 4.0647 375 1.08 0.2812 0.05 -3.6062 12.3785
A 1 4 4.7749 4.0879 375 1.17 0.2435 0.05 -3.2632 12.8129
A 2 3 -2.5286 2.8617 375 -0.88 0.3775 0.05 -8.1557 3.0985
A 2 4 -2.1399 2.8889 375 -0.74 0.4593 0.05 -7.8204 3.5406
A 3 4 0.3887 2.8827 375 0.13 0.8928 0.05 -5.2797 6.0571
The result of estimate is wired but the result of differences of least square means is resonable.
I think the problem is the variable "D". Because the assumption here is that D is not my primary interest so I don't include D in the class statement but it is included in the model statment. I have D and D*A in the model. And in the dataset, not each value of A has every value of D. For example, D has value 5,6,7,8,9,10. But when A=1, D only equals to 6,7,8,9,10. So I think this is an unbalanced dataset. So I do a test to make dummy value in D to match all the values in A. Then running the code above by using the new dataset. The results are matched this time.
My question is what the logical behind the estimate statment would lead its result to be different with that of lsmeans statment? Could somebody explain it to me? I just want to know that what impact the result of estimate statement and I think why the result of lsmeans is reasonable is that the lsmeans only esitimates the fixed variable that mentioned in class statement. Am I right? I attahced the dataset FYI.
Thanks!
The main thing here is that your estimate statements are not the differences between the lsmeans. Try adding the /e option to the LSMEANS statement. You will see that the estimable function includes much more than the parts you are including in your ESTIMATE statements, and that is the reason for the difference. D is handled as a continuous covariate, so the LSMEANS are the marginal values at the mean of D. Given that, it may be that even your LSmeans are a somewhat misleading, due to the imbalance. Get a copy of Littell et al.'s SAS for Mixed Models, 2nd. ed. and read the chapter on analysis of covariance. The LSMEANS should probably be calculated using the AT= option, in order to accommodate the D and A*D terms in the model.
Finally, if you are working in later versions of SAS/STAT, consider using the LSMESTIMATE statement to calculate differences between least squares means rather than the ESTIMATE statement. The syntax is much closer to what you are using. The only addition would be inclusion of the AT= option to accommodate the unequal slopes model that you are fitting.
Steve Denham
The main thing here is that your estimate statements are not the differences between the lsmeans. Try adding the /e option to the LSMEANS statement. You will see that the estimable function includes much more than the parts you are including in your ESTIMATE statements, and that is the reason for the difference. D is handled as a continuous covariate, so the LSMEANS are the marginal values at the mean of D. Given that, it may be that even your LSmeans are a somewhat misleading, due to the imbalance. Get a copy of Littell et al.'s SAS for Mixed Models, 2nd. ed. and read the chapter on analysis of covariance. The LSMEANS should probably be calculated using the AT= option, in order to accommodate the D and A*D terms in the model.
Finally, if you are working in later versions of SAS/STAT, consider using the LSMESTIMATE statement to calculate differences between least squares means rather than the ESTIMATE statement. The syntax is much closer to what you are using. The only addition would be inclusion of the AT= option to accommodate the unequal slopes model that you are fitting.
Steve Denham
Steve is correct. But note, the LSMESTIMATE statement does not work with continuous covariates, just factors.
But you can accommodate a continuous covariate by use of the AT option in the LSMESTIMATE statement. As long as there is one factor involved, you can add continuous covariates to your heart's content.
Steve Denham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.