BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Marjolein
Obsidian | Level 7

Hi all,

 

 

I'm constructing a baseline table in which I want to see whether certain baseline characteristics (e.g. age) are different across categories (n=4) of a certain exposure by doing ANOVA analysis. I'm using quartiles of the exposure category (independent variable), so my design is balanced.

 

Now, I first want to check my assumptions. Please see the syntax I used:

PROC UNIVARIATE DATA=my.data NORMAL PLOT;

VAR X Y Z;

QQplot X Y Z;

BY quartiles_exposure;

RUN;

 

Since I have a large dataset (n>4000), I both look at the QQplots (and histograms), and the Kolmogorov-Smirnov test.

However, even for the variables that look normally distributed visually, the p-value of the KS says <0.0100, constantly, indicating a departure from normality. How is this possible?

 

Furthermore: how am I supposed to test equal variances? Am I maybe using the wrong procedure?

 

Anyone willing to help me out: thanks a lot!

Best regards,

Marjolein

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough.  Random variation will guarantee that.  As a result, the QQ plot is far better in determining if assumptions are met.  Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA.  Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.

 

Steve Denham

View solution in original post

2 REPLIES 2
Ksharp
Super User

Yes. Use HOVTEST.

 

proc glm data=sashelp.class;
class sex;
model weight=sex;
means sex/hovtest;
run;
SteveDenham
Jade | Level 19

Nearly every test for normality is susceptible to finding that the distribution is "not normal" once the sample size is large enough.  Random variation will guarantee that.  As a result, the QQ plot is far better in determining if assumptions are met.  Also, remember that the assumption of normality in ANOVA applies to the residuals and not the variables themselves, so be sure what you use as input into PROC UNIVARIATE are the residuals from your ANOVA.  Finally, recall that ANOVA is robust to most assumptions, especially with large samples, so minor deviations from normality or homoskedasticity will not greatly influence the outcome.

 

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 6685 views
  • 3 likes
  • 3 in conversation