- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I want to run some Goodness-of-fit tests for several variables, but for every variable the three tests (KS, AD and C-vM) show similar p-values 0.010, 0.005, 0.005 respectively. This would mean that every variable is not normally distributed. However, when looking at Q-Q plot and histograms, not every variable is not normally distributed.
It looks like these p-values are default p-values, since for every variable these p-values are the same.
This is what I did:
proc univariate data=<library.dataset>;
var <variables>;
histogram /normal;
qqplot;
run;
Can anyone help me?
Thank you!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Rick_SAS wrote many blogs about this GOF .especially your data is not big enough or have some integer value .
These tests (KS, AD and C-vM) are not always trusted .
I would follow Q-Q plot .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Show us what you are seeing.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I see these p-values for every single variable, but with different D-, W-Sq- and A-Sq-values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@lheer wrote:
I see these p-values for every single variable, but with different D-, W-Sq- and A-Sq-values.
Send the output to a data set and you can examine the, almost certainly miniscule, actul p-values. The TABLES the procedure reports with will use a threshold value instead of attempting to fit a value like 0.00000000004583 into a 6-column display.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Rick_SAS wrote many blogs about this GOF .especially your data is not big enough or have some integer value .
These tests (KS, AD and C-vM) are not always trusted .
I would follow Q-Q plot .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm going to guess that you are testing large data? As KSharb suggests, you might want to read the article "Goodness-of-fit tests: A cautionary tale for large and small samples".
Anyway, a more important question is WHY you want to test many variables for normality. What are you trying to accomplish? Why does a lack of normality bother you?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Rick_SAS You are right. OP must have a big table .
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@lheer wrote:
... but for every variable the three tests (KS, AD and C-vM) show similar p-values 0.010, 0.005, 0.005 respectively
This is NOT what the SAS output is showing. It does not show a value of 0.010 or 0.005 or 0.005 respectively.
It shows a value of <0.010 and <0.005 and <0.005, and these are not default values, these are calculations, rounded to some meaningful threshold.
Paige Miller