BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lheer
Calcite | Level 5

Hi everyone,

 

I want to run some Goodness-of-fit tests for several variables, but for every variable the three tests (KS, AD and C-vM) show similar p-values 0.010, 0.005, 0.005 respectively. This would mean that every variable is not normally distributed. However, when looking at Q-Q plot and histograms, not every variable is not normally distributed.

It looks like these p-values are default p-values, since for every variable these p-values are the same.

This is what I did:

 

proc univariate data=<library.dataset>;
var <variables>;
histogram /normal;
qqplot;
run;

 

Can anyone help me?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

@Rick_SAS  wrote many blogs about this GOF .especially your data is not big enough or have some integer value .

These tests (KS, AD and C-vM) are not always trusted  . 

I would follow Q-Q plot .

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Show us what you are seeing.

--
Paige Miller
lheer
Calcite | Level 5

I see these p-values for every single variable, but with different D-, W-Sq- and A-Sq-values.Capture.PNG

ballardw
Super User

@lheer wrote:

I see these p-values for every single variable, but with different D-, W-Sq- and A-Sq-values.Capture.PNG


Send the output to a data set and you can examine the, almost certainly miniscule, actul p-values. The TABLES the procedure reports with will use a threshold value instead of attempting to fit a value like 0.00000000004583 into a 6-column display.

Ksharp
Super User

@Rick_SAS  wrote many blogs about this GOF .especially your data is not big enough or have some integer value .

These tests (KS, AD and C-vM) are not always trusted  . 

I would follow Q-Q plot .

Rick_SAS
SAS Super FREQ

I'm going to guess that you are testing large data? As KSharb suggests, you might want to read the article "Goodness-of-fit tests: A cautionary tale for large and small samples".

 

Anyway, a more important question is WHY you want to test many variables for normality. What are you trying to accomplish? Why does a lack of normality bother you?

Ksharp
Super User

@Rick_SAS  You are right. OP must have a big table .

PaigeMiller
Diamond | Level 26

@lheer wrote:

... but for every variable the three tests (KS, AD and C-vM) show similar p-values 0.010, 0.005, 0.005 respectively


This is NOT what the SAS output is showing. It does not show a value of 0.010 or 0.005 or 0.005 respectively.

 

It shows a value of <0.010 and <0.005 and <0.005, and these are not default values, these are calculations, rounded to some meaningful threshold.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1121 views
  • 6 likes
  • 5 in conversation