BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
I was hoping someone could help me with this issue:

I conducted a repeated measures analysis using PROC MIXED in SAS. A colleague analyzed the same data with repeated measures ANOVA in STATA and generated different results for some of the analyses. I wanted to make sure that my coding in PROC MIXED was not the issue underlying these differences.

We are analyzing the magnitude and rate of change in calcium levels between two groups of patients after an identical intevention (parallel design). Calcium levels are measured at baseline in each patient and then measurements are repeated in all patients every 30 minutes for 4 hours after the intervention. Individual patients are to be treated as random effects terms, with group as the fixed effect (two levels). We assume compound symmetric covariance. Here is the code I have been using:

PROC MIXED;
CLASS patient group time;
model calcium=group time group*time;
repeated/subject=patient type=cs;
run;

Does this appear appropriate?
1 ACCEPTED SOLUTION

Accepted Solutions
Dale
Pyrite | Level 9
Without knowing exactly what model was fit employing STATA, it is difficult to determine the reason(s) for differences in the results from PROC MIXED and the results from STATA. You also don't state anything about the magnitude of differences between results from STATA and PROC MIXED. Thus, one might also question whether the differences between the two models are meaningful.

As has been mentioned already, the default method for PROC MIXED does not employ moment estimates. Note that it was incorrectly stated that PROC MIXED uses maximum likelihood whereas the default method is actually restricted maximum likelihood (REML). In many balanced designs, REML estimates are identical to moment estimates. From the description of your data collection methods, my guess is that you have a balanced design. So, differences between results from PROC MIXED and those from STATA are probably not due to the method of estimation employed by the MIXED procedure.

More likely, it is probable that the residual covariance structure which you employ (compound symmetry) is not the same as the residual covariance structure which is assumed for the model fitted in STATA. If the STATA module which was employed here operates anything like the GLM procedure in SAS when a repeated measures model is specified, then the STATA module may be employing an unstructured covariance structure. Alternately, it is possible that the model fit in STATA assumes independence.

At the end of your post, you ask whether the code that you have employed appears appropriate. Without having access to the data, it is impossible to answer that question. However, it is quite possible that the assumption of compound symmetry is not warranted. I would suggest that you fit your model first assuming an unstructured covariance structure. Look at the pattern of the covariance matrix. Are all diagonal terms approximately equal and all off-diagonal terms approximately equal? Then compound symmetry is probably a good choice. Or does the covariance structure decay away from the diagonal? In such a situation, an AR(1) covariance structure may be more appropriate. The MIXED procedure allows those (and many other) covariance structures to be specified. Make sure that the covariance structure you assume is appropriate.

You can use a likelihood ratio test to assess empirically whether the assumption of compound symmetry (or AR(1)) is warranted. Both compound symmetric and AR(1) covariance structures are obtained by imposing constraints on the unstructured covariance structure. The difference in -2LL values between the models estimated employing CS and unstructured (UN) covariance structures has an asymptotic chi-square distribution with df equal to the difference in the number of parameters in the unstructured covariance and the number off parameters (2) in the CS covariance.

View solution in original post

4 REPLIES 4
Paige
Quartz | Level 8
It is not surprising that PROC MIXED and PROC ANOVA give different answers. Your code appears to be fine.

PROC MIXED uses Maximum Likelihood Estimation, while PROC ANOVA uses Least Squares. Further PROC ANOVA should not be used (and will give wrong results) if your experiment is unbalanced in terms of number of observations in each category.
deleted_user
Not applicable
Mixed model incorporates a random term whereas PROC ANOVA uses only fixed effects. Also as Paige said, parameter estimation is different for mixed vs anova. PROC GLM or PROC MIXED would be good for unbalanced designs. I prefer PROC GLM over PROC MIXED especially for multiple comparisons.
Karl
Calcite | Level 5
I would like to add one more thing. PROC MIXED allows missing values. In contrast, PROC GLM with REPEATED statement does not allow missing values, that is, if there is a missing value in one subject, all observations in this subject will be ignored.
Dale
Pyrite | Level 9
Without knowing exactly what model was fit employing STATA, it is difficult to determine the reason(s) for differences in the results from PROC MIXED and the results from STATA. You also don't state anything about the magnitude of differences between results from STATA and PROC MIXED. Thus, one might also question whether the differences between the two models are meaningful.

As has been mentioned already, the default method for PROC MIXED does not employ moment estimates. Note that it was incorrectly stated that PROC MIXED uses maximum likelihood whereas the default method is actually restricted maximum likelihood (REML). In many balanced designs, REML estimates are identical to moment estimates. From the description of your data collection methods, my guess is that you have a balanced design. So, differences between results from PROC MIXED and those from STATA are probably not due to the method of estimation employed by the MIXED procedure.

More likely, it is probable that the residual covariance structure which you employ (compound symmetry) is not the same as the residual covariance structure which is assumed for the model fitted in STATA. If the STATA module which was employed here operates anything like the GLM procedure in SAS when a repeated measures model is specified, then the STATA module may be employing an unstructured covariance structure. Alternately, it is possible that the model fit in STATA assumes independence.

At the end of your post, you ask whether the code that you have employed appears appropriate. Without having access to the data, it is impossible to answer that question. However, it is quite possible that the assumption of compound symmetry is not warranted. I would suggest that you fit your model first assuming an unstructured covariance structure. Look at the pattern of the covariance matrix. Are all diagonal terms approximately equal and all off-diagonal terms approximately equal? Then compound symmetry is probably a good choice. Or does the covariance structure decay away from the diagonal? In such a situation, an AR(1) covariance structure may be more appropriate. The MIXED procedure allows those (and many other) covariance structures to be specified. Make sure that the covariance structure you assume is appropriate.

You can use a likelihood ratio test to assess empirically whether the assumption of compound symmetry (or AR(1)) is warranted. Both compound symmetric and AR(1) covariance structures are obtained by imposing constraints on the unstructured covariance structure. The difference in -2LL values between the models estimated employing CS and unstructured (UN) covariance structures has an asymptotic chi-square distribution with df equal to the difference in the number of parameters in the unstructured covariance and the number off parameters (2) in the CS covariance.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 7487 views
  • 0 likes
  • 4 in conversation