03-13-2016 01:15 AM
Let me start off by saying that I am relatively inexperienced with SAS. Previously I have worked with small data sets and have relied on the Shapiro-Wilk test of normality. Now I am working with a data set with over 2000 data points, and am not sure how to interpret the other goodness of fit statistics generated by PROC UNIVARIATE.
I am looking at how spore concentration is related to the time of day; the spore concentration data has been transformed with log10. This is the output I am getting:
How do I interpret these tests of normality? What do the D, W-sq, and A-Sq statistics indicate? Are there certain values of each statistic that indicate normality (as with the Shapiro-Wilk test)? Any suggestions are welcome! It would be great if answers are not too technical.
03-13-2016 05:44 AM
Shapiro-Wilk generally is used for small data like < 2000. But your data is bigger than 2000.
H0 : x is normal distribution . So your three p-value <0.05 means your data is NOT normal distribution.
03-13-2016 11:13 PM
Your data is clearly not distributed normally. Now, if spore concentration is in fact related to time of day, you should remove that effect and look at the normality of the residuals. That said, it is very unlikely that 2000+ real world observations (as opposed to simulated data) will pass the normality tests. With so many points, these tests become sensitive to the slightest departure from the normal distribution. The good news is that with so many points, statistical models with normal errors become very robust to departures from the normal assumption because of the Central limit theorem.