Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Marjolein
Obsidian | Level 7

Hi all,

 

 

I'm constructing a baseline table in which I want to see whether certain baseline characteristics (e.g. age) are different across categories (n=4) of a certain exposure by doing ANOVA analysis. I'm using quartiles of the exposure category (independent variable), so my design is balanced.

 

Now, I first want to check my assumptions. Please see the syntax I used:

PROC UNIVARIATE DATA=my.data NORMAL PLOT;

VAR X Y Z;

QQplot X Y Z;

BY quartiles_exposure;

RUN;

 

Since I have a large dataset (n>4000), I both look at the QQplots (and histograms), and the Kolmogorov-Smirnov test.

However, even for the variables that look normally distributed visually, the p-value of the KS says <0.0100, constantly, indicating a departure from normality. How is this possible?

 

Furthermore: how am I supposed to test equal variances? Am I maybe using the wrong procedure?

 

Anyone willing to help me out: thanks a lot!

Best regards,

Marjolein

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough.  Random variation will guarantee that.  As a result, the QQ plot is far better in determining if assumptions are met.  Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA.  Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.

 

Steve Denham

View solution in original post

2 REPLIES 2
Ksharp
Super User

Yes. Use HOVTEST.

 

proc glm data=sashelp.class;
class sex;
model weight=sex;
means sex/hovtest;
run;
SteveDenham
Jade | Level 19

Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough.  Random variation will guarantee that.  As a result, the QQ plot is far better in determining if assumptions are met.  Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA.  Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.

 

Steve Denham

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 7914 views
  • 3 likes
  • 3 in conversation