BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
heansy
Fluorite | Level 6

This is my first post here. Thanks in advance.

 

I try to get result of differences of least squares means by using PROC MIXED. The code is like this:

 

proc mixed data=dummy;
  class A B C;
  model Y = A B C D A*C A*D;
  lsmeans A /pdiff cl;
  estimate 'A2 vs A1' A -1 1 0 0;
  estimate 'A2 vs A3' A 0 1 -1 0;
  estimate 'A2 vs A4' A 0 1 0 -1;
  estimate 'A3 vs A1' A -1 0 1 0;
  estimate 'A4 vs A1' A -1 0 0 1;
  estimate 'A2 vs A3' A 0 1 -1 0;
run;

 

Usually, the result of lsmeans statment and estimate statment should be consistent. But this time they are not:

 

                                                            Estimates

Label          Estimate Standard Error     DF    t Value     Pr > |t| Alpha   Lower                Upper

A2 vs A1    -55.1659    46.7075           375      -1.18    0.2383 0.05     -147.01            36.6755

A2 vs A3    19.0351     32.1103           375      0.59     0.5537 0.05     -44.1038          82.1739

A2 vs A4     -7.3886     35.7314           375      -0.21    0.8363 0.05     -77.6476          62.8705

A3 vs A1     -74.2009   45.4017           375      -1.63    0.1030 0.05     -163.47            15.0730

A4 vs A1     -47.7773   47.9488           375      -1.00    0.3197 0.05     -142.06            46.5050

A2 vs A3     19.0351    32.1103           375      0.59     0.5537 0.05     -44.1038          82.1739

                                                         
                                                     Least Squares Means Effect
A     Estimate SE         DF       t Value        Pr > |t|  Alpha          Lower Upper
 

A1   30.4191 3.5455   375         8.58        <.0001   0.05        23.4475 37.3907

A2   23.5044 2.0461   375         11.49     <.0001   0.05        19.4812 27.5275

A3   26.0330 2.0424   375         12.75     <.0001   0.05        22.0169 30.0491

A4   25.6443 2.0548   375         12.48     <.0001   0.05        21.6039 29.6847

 

                                             Differences of Least Squares Means Effect
A _A     Estimate     SE       DF      t Value        Pr > |t| Alpha           Lower Upper

A 1 2     6.9148   4.0733   375         1.70        0.0904   0.05        -1.0946 14.9241

A 1 3     4.3861   4.0647   375         1.08        0.2812   0.05        -3.6062 12.3785

A 1 4     4.7749   4.0879   375         1.17        0.2435   0.05        -3.2632 12.8129

A 2 3     -2.5286 2.8617   375         -0.88      0.3775   0.05        -8.1557 3.0985

A 2 4     -2.1399 2.8889   375         -0.74      0.4593   0.05        -7.8204 3.5406

A 3 4      0.3887 2.8827   375          0.13      0.8928    0.05         -5.2797 6.0571


The result of estimate is wired but the result of differences of least square means is resonable.

 

I think the problem is the variable "D". Because the assumption here is that D is not my primary interest so I don't include D in the class statement but it is included in the model statment. I have D and D*A in the model. And in the dataset, not each value of A has every value of D. For example, D has value 5,6,7,8,9,10. But when A=1, D only equals to 6,7,8,9,10. So I think this is an unbalanced dataset. So I do a test to make dummy value in D to match all the values in A. Then running the code above by using the new dataset. The results are matched this time. 

 

My question is what the logical behind the estimate statment would lead its result to be different with that of lsmeans statment? Could somebody explain it to me? I just want to know that what impact the result of estimate statement and I think why the result of lsmeans is reasonable is that the lsmeans only esitimates the fixed variable that mentioned in class statement. Am I right? I attahced the dataset FYI. 

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

The main thing here is that your estimate statements are not the differences between the lsmeans.  Try adding the /e option to the LSMEANS statement.  You will see that the estimable function includes much more than the parts you are including in your ESTIMATE statements, and that is the reason for the difference.  D is handled as a continuous covariate, so the LSMEANS are the marginal values at the mean of D.  Given that, it may be that even your LSmeans are a somewhat misleading, due to the imbalance.  Get a copy of Littell et al.'s SAS for Mixed Models, 2nd. ed. and read the chapter on analysis of covariance.  The LSMEANS should probably be calculated using the AT= option, in order to accommodate the D and A*D terms in the model.

 

Finally, if you are working in later versions of SAS/STAT, consider using the LSMESTIMATE statement to calculate differences between least squares means rather than the ESTIMATE statement.  The syntax is much closer to what you are using.  The only addition would be inclusion of the AT= option to accommodate the unequal slopes model that you are fitting.

 

Steve Denham

View solution in original post

3 REPLIES 3
SteveDenham
Jade | Level 19

The main thing here is that your estimate statements are not the differences between the lsmeans.  Try adding the /e option to the LSMEANS statement.  You will see that the estimable function includes much more than the parts you are including in your ESTIMATE statements, and that is the reason for the difference.  D is handled as a continuous covariate, so the LSMEANS are the marginal values at the mean of D.  Given that, it may be that even your LSmeans are a somewhat misleading, due to the imbalance.  Get a copy of Littell et al.'s SAS for Mixed Models, 2nd. ed. and read the chapter on analysis of covariance.  The LSMEANS should probably be calculated using the AT= option, in order to accommodate the D and A*D terms in the model.

 

Finally, if you are working in later versions of SAS/STAT, consider using the LSMESTIMATE statement to calculate differences between least squares means rather than the ESTIMATE statement.  The syntax is much closer to what you are using.  The only addition would be inclusion of the AT= option to accommodate the unequal slopes model that you are fitting.

 

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Steve is correct. But note, the LSMESTIMATE statement does not work with continuous covariates, just factors.

SteveDenham
Jade | Level 19

But you can accommodate a continuous covariate by use of the AT option in the LSMESTIMATE statement. As long as there is one factor involved, you can add continuous covariates to your heart's content.

 

Steve Denham

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1890 views
  • 3 likes
  • 3 in conversation