I have dataset like follows:
Person GroupA GroupB GroupC DV
1 A F 1 30
1 A F 2 40
1 A N 1 20
1 A N 2 15
1 B F 1 25
1 B F 2 24
1 B N 1 22
1 B N 2 17
2.............A F 1 12
2.......................................................
In this case, I want to compare the mean difference between A, B, C. How could I conduct the repeated ANOVA?
PROC MIXED;
CLASS person groupA groupB groupC;
MODEL DV = groupA|groupB|groupC;
REPEATED / SUBJECT=person TYPE=CS;
run;
Does the above code look correct? But I saw some example that I think should put "repeated groupA groupB groupc/SUBJECT=person" because those groups are repeated within persons, but I can't put more than one repeated effects. Could anyone help me with it? Thank you!
From your design, it appears that A, B, and C are fixed effects. You will want to keep these on the MODEL statement and only have your person effect as random. So something like
class person A B C;
model y=A B C;
random int / subject=person;
should work. You can expand the MODEL statement to include interactions if you wish.
If you are concerned about heterogeneity of the residual variances, you can model this heterogeneity in MIXED through the REPEATED statement and the GROUP= option. If you add
repeated / group=a*b*c;
then that statement will fit a different residual variance for each combination of A*B*C. One caution, however, is that you need sufficient data to fit these 8 different residual variances (if there are 2 levels to each of A, B, and C). With a single replication of each A*B*C level in each person, you may not have enough data to estimate all of these variances. You can try different GROUP= effects if that is the case (GROUP=A, GROUP=B*C for instance).
You can fit the model with and without the REPEATED statement. The difference is the -2LL stats for the two models will be chi-square with df equal to the difference in the number of covariance parameters fit.
The nice thing about MIXED is it's incredible flexibility. Concerned about homogeneity of residual variance? Then fit those heterogenous variances!.
When you are saying the visit number variable do you mean the time variable? Actually I don't have time variable in my dataset, but all the group A B and C are nested within each person.
If there are only 2 levels for each of the factors A, B and C, you may want to consider calling at least one effect RANDOM. The other two could then be fit as doubly repeated measures using the Kronecker product UN@UN as the type. The subject= will be slightly different for each of these.
Here is an example for that (no guarantees as to whether this will work or not, without having the data to fit)::
PROC MIXED;
CLASS person groupA groupB groupC;
MODEL DV = groupA|groupB|groupC;
RANDOM intercept/subject=GroupC;
REPEATED groupA GroupB/ SUBJECT=person*groupC TYPE=UN@UN;
run;
Another method would be to concatenate the factors essentially making this a one-way ANOVA, and use LSMESTIMATE statements to with a JOINT option to get F tests of interest. You would have to either switch to PROC GLIMMIX or use a STORE statement followed by PROC PLM.
SteveDenham
Hi, thank you for your reply and suggestions, but the group A, B, and C have more than three levels, can I put all three of them in the repeated statement? Or I saw there is one procedure call the proc genmod, can I use this instead without putting the repeated statements?
I guess the proc genmod doesn't work because it didn't produce the F test result, and I can't check the mean difference among all those factors. Do you mean if the group A, B, and C are more than three factors, I should use proc glimmix to conduct the repeated ANOVA?
There are several ways to model the correlations in your data. I am not sure what the variables GroupA, GroupB, and GroupC are, but one of the straightforward models would be modeling the correlations through random effects. For example,
PROC MIXED;
CLASS person groupA groupB groupC;
MODEL DV = groupA|groupB|groupC;
RANDOM intercept GroupA GroupB GroupC GroupA*GroupB GroupA*GroupC GroupB*GroupC/subject=person;
run;
Some of the random interaction effects might or might be present in your data. If you get zero estimated variance for some of the random effects, you might consider taking them out of the RANDOM statement.
I tried your method, and I didn't see any of the random effect in the output page, I only saw the Type three test of fixed effects. Does that look good?
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
group A 3 12 0.81 0.5112
group B 2 8 4.00 0.0626
groupA*groupB 6 24 0.58 0.7460
group C 3 12 0.99 0.4298
groupA*groupC 9 36 0.82 0.5985
groupB*groupC 6 24 1.33 0.2815
A*B*C 18 72 1.76 0.0489
Did you get the Covariance Parameter Estimates table in the output?
Sorry, yes, I get the covariance parameter estimate in the table. But will the "Estimated G matrix is not positive definite" affect the result?
What is in this Covariance Parameter Estimates table? Can you send it here?
Below is the covariance parameter table, should I remove those with zeros and run the analysis again to avoid the negative definite?
Covariance Parameter Estimates
Cov Parm Subject Estimate
Intercept specimen 2.1541
Group A specimen 0.7469
Group B specimen 0
Group C specimen 0
GroupA*GroupB specimen 0.04165
GroupA*GroupC specimen 5.0689
GroupB*GroupC specimen 0
Residual 0.1730
You should re run, leaving out Group B, Group C and the interaction Group B by Group C in the random statement. A possible cause for the zero estimates is that the ML estimates are negative, and so using REML results in a zero value. This would occur if all of the variability in these three factors is already explained by Group A and it's interactions. However, it usually turns out that the standard errors of estimates are only minimally affected by leaving these in, and leaving them in results in denominator degrees of freedom that reflect your design. The issue now is that you are "splitting" the variability due to these factors between fixed and random effects. A possibility would be to exclude these factors from the RANDOM statement.
SteveDenham
I agree with you Steve. For the ddf concerns, I would add the ddfm=kr option in the MODEL statement.
Thank you so much! I have one more question, if I check the dependent variable is not normally distributed, could I still use the proc mixed?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.