BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LzEr23
Obsidian | Level 7

Dear all

 

I am having trouble with obtaining a normality test result using the Shapiro-Wilk (SW) test.

Usually, I have used the Univariate procedure with normal or normaltest options and was able to easily get normality test results for all four tests.

e.g.)

Proc univariate data=work.have normal;

var series;

run;

 

Now, I'm dealing with a larger dataset of which the observations are apparently greater than 2000, and the SAS suddenly does not show SW test results.

All other test statistics are fine, but only the SW is omitted.

I read that the SAS program automatically omits the SW test when the sample size is greater than 2000.

But, I still want to see the results using the SW test.

 

So, my question is, is there any other way that I can specify the SW method to test normality of the distribution of my dataset, which contains more than 2000 observations.

 

My dataset is shown as below.

 

data work.have;
   input date: yymmddn8. series;
   format date yymmddn8.;
   datalines;
   20170101 501;
   20170102 500.5;
   20170103 505;
   ...
   20171231 512;
run;

 

I'd really appreciate if anyone could help me get this test result.

Hope my description was sufficient to understand.

But, it hadn't been so, please leave a note and ask.

 

Thanks in advance.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

The documentation is pretty clear

 

Shapiro-Wilk Statistic

 

If the sample size is less than or equal to 2000 and you specify the NORMAL option, PROC UNIVARIATE computes the Shapiro-Wilk statistic, W (also denoted as to emphasize its dependence on the sample size n).

 

So you can't get this statistic calculated for sample sizes above 2000. Other tests of normality should be used with sample sizes above 2000.

--
Paige Miller

View solution in original post

3 REPLIES 3
PaigeMiller
Diamond | Level 26

The documentation is pretty clear

 

Shapiro-Wilk Statistic

 

If the sample size is less than or equal to 2000 and you specify the NORMAL option, PROC UNIVARIATE computes the Shapiro-Wilk statistic, W (also denoted as to emphasize its dependence on the sample size n).

 

So you can't get this statistic calculated for sample sizes above 2000. Other tests of normality should be used with sample sizes above 2000.

--
Paige Miller
ballardw
Super User

If you read this link https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test

you will find that some folks have extended the test to 5000 records but apparently that has not been added to SAS yet. I suspect a big issue is determining the critical points of the test statistic. The phrase from that link 

The cutoff values for the statistics are calculated through Monte-Carlo simulations.

is an indicator to me that it is NOT a trivial exercise to expand sample sizes.

You do not indicate how big your sample might be. If it is greater than 5000 it looks like you would get your name attached if your work this out.

 

Short answer: Ain't gonna happen in SAS Proc Univariate.

 

 

Ksharp
Super User

Maybe @Rick_SAS  might write a blog (iml code) for this question ?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 6169 views
  • 1 like
  • 4 in conversation