turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Normality Test: different p-values from different ...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-28-2016 12:06 PM

I want to do a t-test and also ANOVA. But my data is extremely positive skewed. I used log transformation to normalize it. Then I did a normality test.

The three tests gave different P-values. For Kolmogorov-Smirnov, the P-value is >0.05, but the other two tests are <0.05.

sample size 67994. Is this transformation acceptable for normality considering t-test and anova analysis?

```
proc univariate data = test1 normal;
class gender;
var newvar;
histogram /normal kernel;
qqplot newvar;
run;
```

Goodness-of-Fit Tests for Normal Distribution | ||||
---|---|---|---|---|

Test | Statistic | p Value | ||

Kolmogorov-Smirnov | D | 0.09694082 | Pr > D | 0.119 |

Cramer-von Mises | W-Sq | 0.14977297 | Pr > W-Sq | 0.024 |

Anderson-Darling | A-Sq | 0.98653289 | Pr > A-Sq | 0.013 |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-28-2016 02:39 PM

You should go for AD than KS , as KS is more centric the middle of the data .

Here is one research artical which may help you.

https://www.researchgate.net/publication/267205556_Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smir...

Here is one research artical which may help you.

https://www.researchgate.net/publication/267205556_Power_Comparisons_of_Shapiro-Wilk_Kolmogorov-Smir...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-28-2016 03:22 PM

But also beware that the tests here are immensely over-powered to detect differences. You will learn far more from the QQ plot. That long flat part at the beginning is evidence that there is a mixture here, and a rough look at the data makes me think that you are using a lower limit of quantitation value for a lot of observations. If that is the case, there are a number of ways to address the issues of analysis.

Also, the assumptions of ANOVA (and of the t test) are __ not__ that the data are normally distributed, but that the errors/residuals are normally distributed. Try running the analysis on the transformed data, and then testing the residuals for normality.

With a sample size this large, and the known conservatism of tests for normality, p values in this range should probably not be regarded as strong evidence for lack of normality of the residuals, which the ANOVA is relative robust to, in any case.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-28-2016 04:07 PM

You do not need o transform your variable. The UNIVARIATE procedure can fit a lognormal and other skewed distributions.

As Steve points out, the Q-Q plot contains the graphical information about the fit. To learn more about the Q-Q plot and how to create it in SAS, see "Modeling the distribution of data? Create a Q-Q plot."

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-28-2016 06:09 PM

I would not use a test for normality with 68,000 observations. You will almost always get a 'significant' result. Graphic appraisal is best.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-28-2016 11:20 PM - edited 04-28-2016 11:21 PM

Compare the p-value you get with ANOVA or ttest on the log-transformed data with the p-value from Wilcoxon rank sum (non parametric) test from proc NPAR1WAY on the untransformed data. The later should confirm the former.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-29-2016 09:38 AM

I like this approach @PGStats, except all of the ties with the lower bound mean a loss of power. I would suggest a tobit analysis on the log transformed data (say with PROC QLIM), but that might make a newcomer to SAS run screaming, I'm afraid.

Steve Denham