- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I'm constructing a baseline table in which I want to see whether certain baseline characteristics (e.g. age) are different across categories (n=4) of a certain exposure by doing ANOVA analysis. I'm using quartiles of the exposure category (independent variable), so my design is balanced.
Now, I first want to check my assumptions. Please see the syntax I used:
PROC UNIVARIATE DATA=my.data NORMAL PLOT;
VAR X Y Z;
QQplot X Y Z;
BY quartiles_exposure;
RUN;
Since I have a large dataset (n>4000), I both look at the QQplots (and histograms), and the Kolmogorov-Smirnov test.
However, even for the variables that look normally distributed visually, the p-value of the KS says <0.0100, constantly, indicating a departure from normality. How is this possible?
Furthermore: how am I supposed to test equal variances? Am I maybe using the wrong procedure?
Anyone willing to help me out: thanks a lot!
Best regards,
Marjolein
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough. Random variation will guarantee that. As a result, the QQ plot is far better in determining if assumptions are met. Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA. Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes. Use HOVTEST.
proc glm data=sashelp.class;
class sex;
model weight=sex;
means sex/hovtest;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough. Random variation will guarantee that. As a result, the QQ plot is far better in determining if assumptions are met. Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA. Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.
Steve Denham