Statistical Procedures

Marjolein · Posted 10-21-2016 06:58 AM

Hi all,

I'm constructing a baseline table in which I want to see whether certain baseline characteristics (e.g. age) are different across categories (n=4) of a certain exposure by doing ANOVA analysis. I'm using quartiles of the exposure category (independent variable), so my design is balanced.

Now, I first want to check my assumptions. Please see the syntax I used:

PROC UNIVARIATE DATA=my.data NORMAL PLOT;

VAR X Y Z;

QQplot X Y Z;

BY quartiles_exposure;

RUN;

Since I have a large dataset (n>4000), I both look at the QQplots (and histograms), and the Kolmogorov-Smirnov test.

However, even for the variables that look normally distributed visually, the p-value of the KS says <0.0100, constantly, indicating a departure from normality. How is this possible?

Furthermore: how am I supposed to test equal variances? Am I maybe using the wrong procedure?

Anyone willing to help me out: thanks a lot!

Best regards,

Marjolein

SteveDenham · Posted 11-02-2016 01:43 PM

Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough. Random variation will guarantee that. As a result, the QQ plot is far better in determining if assumptions are met. Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA. Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.

Steve Denham

View solution in original post

Ksharp · Posted 10-21-2016 11:35 PM

Yes. Use HOVTEST.

proc glm data=sashelp.class;
class sex;
model weight=sex;
means sex/hovtest;
run;

SteveDenham · Posted 11-02-2016 01:43 PM

Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough. Random variation will guarantee that. As a result, the QQ plot is far better in determining if assumptions are met. Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA. Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.

Steve Denham

Statistical Procedures

Evaluating ANOVA assumptions using SAS

Re: Evaluating ANOVA assumptions using SAS

Re: Evaluating ANOVA assumptions using SAS

Re: Evaluating ANOVA assumptions using SAS

Follow Us

What is...

Statistical Procedures

Evaluating ANOVA assumptions using SAS

Re: Evaluating ANOVA assumptions using SAS

Re: Evaluating ANOVA assumptions using SAS

Re: Evaluating ANOVA assumptions using SAS

Our biggest data and AI event of the year.

Follow Us

What is...