ANOVA assumes that residuals (errors) are normally distributed and terms have equal variance (homoscedasticity, antonym heteroscedasticity). Professional statisticians frequently check ANOVA assumptions visually.
We bring forth a dataset that formed the basis of a paper describing Calluna (heath) plants’ response to Nitrogen and Drought tolerance. Nitrogen, plant source (heathland), and drought were applied in a 2*2*2 factorial. Researchers randomized plants in a greenhouse, with 10 plant pots per treatment unit (n=10), tested over two years.
This dataset holds some interesting clues about nitrogen and drought effects on heath plants. But before relying too much on the output, we should test the assumptions. How is that done visually?
ods graphics on; */The graphics statement turns on the ability to display plots*/; proc mixed data=Heath.data Plots(only)=(studentpanel(conditional) Boxplot(conditional)); */The plots option specifies two types of plots are output. The first the student panel, and the second are treatment-specific boxplots. Conditional option within those require calculation of residuals based on the model specification, i.e. taking into account the relationship of the treatments to one-another in the factorial design*/; class Year Heathland Nitrogen Drought Replicate; model 'dry weight above (g)'n= Drought Nitrogen Drought*nitrogen Heathland Heathland*Drought Heathland*Nitrogen Heathland*Drought*Nitrogen; random 'Year'n; RUN;
Studentized residuals clearly demonstrate a bimodal distribution in residual variance.
Bimodal distribution of variance
Let’s take a look at the boxplots to try to understand trends of unexplained variance.
Unequal variance among watering treatments
By far the widest boxplot range of residuals is from the well-watered treatment. This appears to be the culprit for the unequal variance. The data points associated with well-watered treatment skew high and low. Perhaps individual plants responded to plenty of water water either well or poorly. Next time, it might be useful to keep this in mind and capture watering response as an explanatory variable.
While the watering treatment represents a departure from equal variance, this was not the cause for the non-normal distribution. We can see this by reviewing median residual points, which are similar among the two watering treatments. The non-normality was due to another factor: notice the skew in the boxplots’ medians of year and nitrogen. Digging into the data, the results point to the two years producing different drought and nitrogen treatment effects for above ground dry weight. For this reason, it could be advisable to analyze each experiment independently by year.
Testing ANOVA assumptions need not be a checkbox exercise. The visual review of residuals allows researchers to make the most of our experiments and data models.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.