Solved
Contributor
Posts: 23

# Evaluating ANOVA assumptions using SAS

Hi all,

I'm constructing a baseline table in which I want to see whether certain baseline characteristics (e.g. age) are different across categories (n=4) of a certain exposure by doing ANOVA analysis. I'm using quartiles of the exposure category (independent variable), so my design is balanced.

Now, I first want to check my assumptions. Please see the syntax I used:

PROC UNIVARIATE DATA=my.data NORMAL PLOT;

VAR X Y Z;

QQplot X Y Z;

BY quartiles_exposure;

RUN;

Since I have a large dataset (n>4000), I both look at the QQplots (and histograms), and the Kolmogorov-Smirnov test.

However, even for the variables that look normally distributed visually, the p-value of the KS says <0.0100, constantly, indicating a departure from normality. How is this possible?

Furthermore: how am I supposed to test equal variances? Am I maybe using the wrong procedure?

Anyone willing to help me out: thanks a lot!

Best regards,

Marjolein

Accepted Solutions
Solution
‎11-04-2016 06:20 AM
Posts: 2,655

## Re: Evaluating ANOVA assumptions using SAS

Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough.  Random variation will guarantee that.  As a result, the QQ plot is far better in determining if assumptions are met.  Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA.  Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.

Steve Denham

All Replies
Super User
Posts: 10,213

## Re: Evaluating ANOVA assumptions using SAS

Yes. Use HOVTEST.

``````proc glm data=sashelp.class;
class sex;
model weight=sex;
means sex/hovtest;
run;``````
Solution
‎11-04-2016 06:20 AM
Posts: 2,655

## Re: Evaluating ANOVA assumptions using SAS

Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough.  Random variation will guarantee that.  As a result, the QQ plot is far better in determining if assumptions are met.  Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA.  Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.

Steve Denham

☑ This topic is solved.