Remember that the assumption for ANOVA is that the errors are normally distributed, not the data. Someplace else in this thread you asked how to do that. An OUTPUT statement that puts all of the residuals in a data set, followed by the test for normality in PROC UNIVARIATE will work, but a QQ plot would be even better, as all known tests for normality suffer from dependency on sample size - if the dataset is large, then the test is over powered and will declare even small deviations from normality to be significant, and if the dataset is small it is under powered and could easily miss a true difference.
Somewhere else in this thread you ask about non-normal dependent variables. It would help to know what your dependent variable(s) is/are. Counts may have a Poisson, generalized Poisson or negative binomial distribution. Strict proportions of the events/trials type are, by default, binomial. Ratios of continuous variables that are continuous on the open interval (0,1) tend to have a beta distribution. PROC GLIMMIX (for conditional inference) and GENMOD (for marginal inference) are the right tools for this, so read through the examples for those two procedures.
So now it appears that the variables A, B and C are measures on each subject, rather than classification variables to which subjects are assigned at random. Is that the case? Without breaking any privacy issues, could you tell us what these variables are? I am leaning toward the approach given by @StatsMan of treating the subject as the random effect, clustering responses by subject. This approach could probably be done more directly with PROC GEE or GENMOD.
However, if there are assigned "treatments" included in these 3, that approach may need some additional thought. lA thorough description of design and variables would go a long way toward addressing all of these issues.
SteveDenham
Hi, Steve, thank you so much for you long response. Yes, the three variables group A, B and C are measured in each individual person, we can think the variables in this way, if we want to measure people's weight, we can measure it with different positions and poses of each person: position: lie, squat, stand; head: head up, turn head, head down; or hand: single hand up, double hands up, hands straight. So with each combination of position, head, and hand, we have a number for weight. Sorry, that the only similar example that I can imagine, so basically, position is group A, head is group B, hand is group C, and DV is weight. And they all are nested within each person.
Since you said PROC GLIMMIX is appropriate for both non-normal dependent variables, and homogeneity test, do you suggest that I could use PROC GLIMMIX to conduct the same repeated ANOVA with treating the subject as the random effect, clustering response by subject? Just like the following example code?
PROC GLIMMIX ;
CLASS person groupA groupB groupC;
MODEL DV =groupA|groupB|groupC /ddfm=kr;
RANDOM int / subject=person;
COVTEST homogeneity;
output out=out1 r=resid;
run;
quit;
The proc glimmix code is almost correct. To get the homogeneity test, you must have a GROUP= option for the random effect. In this case, you might want to go full throttle, and specify the 3 way interaction as the group variable. Add this to the RANDOM statement:
group=groupA*groupB*groupc;
Depending on how many unique combinations of the three factors you have this could get very difficult to fit (at least in terms of time or iterations). Also, there is now a huge chance that some of the variance components are zero, and we will be back where we were with the nonpositive definite G matrix. Then because the group= option only takes a single effect as an argument (interactions are single effects in this case), you will have to remove one of the three factors from the interaction. It might be possible to fit multiple RANDOM statements, but, as Han Solo said, I have a bad feeling about that.
SteveDenham
I think the code runs good! Only below warnings and notes and I think they might be not a big deal:
WARNING: MIVQUE0 estimate of profiled variance is linearly related to other covariance parameters.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: At least one element of the gradient is greater than 1e-3.
NOTE: A linear combination of covariance parameters is confounded with the residual variance.
However, the homogeneity test is significant. The dependent is not normal, and the homogeneity test is significant, do you think that I should still stick to the PROC GLIMMIX or I should use Friedman test, the non-parametric test, to conduct the repeated ANOVA? Thank you!
And I tried to transform the dependent variable using LOG argument, and tried the model again. This time the homogeneity is non-significant. Do you think that I should still stick to the original dependent variable or I can use the transformed one? Sorry about all those questions.
By the way, how could I conduct the homogeneity of variance test in proc mixed for repeated ANOVA? Thank you!
This is relatively easy to do in GLIMMIX, but not so easy in MIXED. In GLIMMIX there is a specific COVTEST option for homogeneity. For MIXED, you'll need the residuals, some DATA step manipulation, and then analysis on the transformed residuals using PROC GLM in some form of one way analysis to get a Levene's test or a Brown-Forsythe test.
SteveDenham
Hi, Steve,
I am sorry to bother you. The more I think, the more question that I have. I will make a summary about the total information that I have, and could I ask you a few questions based on it?
The different groups are measured on each subject, for example: I want to measure each person's weight based on the position(lie, squat, stand), hand gesture(up, down, straight), location (1,2,3,4): the original dataset are like follows:
Person position hand location1 location2 location3 location4
1 lie down 63 61 64 62
1 lie up 57 59 58 55
1 squat down 52 57 55 54
1 squat up 23 56 55 53
2 lie down 63 61 64 62
2 lie up 57 59 58 55
2 squat down 52 57 55 54
2 squat up 23 56 55 53
If I want to test the mean difference among all those groups, I should use repeated ANOVA since all those groups are nested within each person. However, if I want to test the assumptions: normality, homogeneity, and sphericity. I think only PROC GLM allows me to conduct sphericity test. So I could put four weights as dependent variables, and treat person, position, and hand as between-subject effect, and location as within-subject effect as following code:
PROC GLM;
CLASS person position hand ;
MODEL location1 location2 location3 location4 = person|position|hand / nouni; REPEATED weight 4 / PRINTE ;
RUN ;
However, the independence assumption is clearly violated because each value was measured within each person.
In this case, do you think should I still can use PROC GLM, or I should use the PROC GLIMMIX by treating person as random effect like the post said (I will convert it to long format)? But if I use PROC GLIMMIX, I can't test the sphericity, and the homogeneity is violated. And if I use PROC GLM, the independence assumption is violated. What should I do about it? Sorry for the long question, but I am really struggling about this dataset. Thank you so much!
I saw the Estimated G matrix is not positive definite. Might be that's the reason it didn't show any random effects? Should I change something in the code?
Another possibility is to treat person as a block and make PERSON the single random effect in your model. Were these measures taken sequentially over time, or are they observational statistics on the combination of the fixed effects A,B, and C? If you go with this approach, then put A,B, and C on the model statement and use a single RANDOM statement with
random int / subject=person;
Thank you for your reply. Those measures were not taken sequentially over time, they just the observational statistics for each person. So do you think the random int/subject=person work as same as put all groups in the random like random interceptA B C A*B A*C B*C /subject=person? So the result is the repeated ANOVA instead of multilevel modeling?
From your design, it appears that A, B, and C are fixed effects. You will want to keep these on the MODEL statement and only have your person effect as random. So something like
class person A B C;
model y=A B C;
random int / subject=person;
should work. You can expand the MODEL statement to include interactions if you wish.
If you are concerned about heterogeneity of the residual variances, you can model this heterogeneity in MIXED through the REPEATED statement and the GROUP= option. If you add
repeated / group=a*b*c;
then that statement will fit a different residual variance for each combination of A*B*C. One caution, however, is that you need sufficient data to fit these 8 different residual variances (if there are 2 levels to each of A, B, and C). With a single replication of each A*B*C level in each person, you may not have enough data to estimate all of these variances. You can try different GROUP= effects if that is the case (GROUP=A, GROUP=B*C for instance).
You can fit the model with and without the REPEATED statement. The difference is the -2LL stats for the two models will be chi-square with df equal to the difference in the number of covariance parameters fit.
The nice thing about MIXED is it's incredible flexibility. Concerned about homogeneity of residual variance? Then fit those heterogenous variances!.
This approach is great and @SAS-questioner should consider it. There is a lot of concern about assumptions that have been brought up that really will not affect what conclusions can be drawn. One change to consider is to only include main effects and two-way interactions in the MODEL statement. Including the three-way is the cause of many of the WARNINGs seen so far.
One thing want to emphasize to @SAS-questioner : DO NOT USE PROC GLM TO TEST HYPOTHESES IN THE SPLIT-PLOT (REPEATED MEASURES) DESIGN, WITHOUT BEING PREPARED TO DO POST PROCESSING. The wrong denominator with wrong degrees of freedom is used in the tests for the whole plot effects. See any of the editions of SAS for Mixed Models for coverage of this.
This design is very much in the wheelhouse of generalized estimating equations (GENMOD and GEE), and I believe your best marginal analysis will be found using that approach, while your best conditional analysis is through one of the mixed model procedures.
SteveDenham
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.