BookmarkSubscribeRSS Feed
ST1
Calcite | Level 5 ST1
Calcite | Level 5

Let me start off by saying that I am relatively inexperienced with SAS. Previously I have worked with small data sets and have relied on the Shapiro-Wilk test of normality. Now I am working with a data set with over 2000 data points, and am not sure how to interpret the other goodness of fit statistics generated by PROC UNIVARIATE.

 

I am looking at how spore concentration is related to the time of day; the spore concentration data has been transformed with log10. This is the output I am getting:

sas.png

How do I interpret these tests of normality? What do the D, W-sq, and A-Sq statistics indicate? Are there certain values of each statistic that indicate normality (as with the Shapiro-Wilk test)? Any suggestions are welcome! It would be great if answers are not too technical.

 

3 REPLIES 3
Ksharp
Super User

Shapiro-Wilk generally is used for small data like < 2000. But your data is bigger than 2000.

H0 : x is normal distribution . So your three p-value <0.05 means your data is NOT normal distribution.

PGStats
Opal | Level 21

Your data is clearly not distributed normally. Now, if spore concentration is in fact related to time of day, you should remove that effect and look at the normality of the residuals. That said, it is very unlikely that 2000+ real world observations (as opposed to simulated data) will pass the normality tests. With so many points, these tests become sensitive to the slightest departure from the normal distribution. The good news is that with so many points, statistical models with normal errors become very robust to departures from the normal assumption because of the Central limit theorem

PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 4523 views
  • 0 likes
  • 4 in conversation