BookmarkSubscribeRSS Feed
GemmaR
SAS Employee

Hi Guys,

I just wanted to ask a little bit of advice on the goodness of fit normality tests that are generated using proc univariate.

This may be a really silly question, but why do the p-values for the goodness of fit test change when I use estimated Mu and sigma to when I type them in?

Am I misunderstanding the estimate?

Any advice would be greatly appreciated.

For ease, I have attached a small piece of code as an example.

many thanks.

3 REPLIES 3
Rick_SAS
SAS Super FREQ

It's not silly. It's somewhat subtle. The distributions of the GOF statistic depends on the parameters you are estimating.

Let's say you are estimating two parameters: location and scale. In order to estimate the scale parameter, you first have to estimate the location parameter, and then USE that estimate to estimate the scale parameter. Using an estimate is different than using the true value; there is more uncertainty.

This comes up a lot in statistics. A familiar example is the formula for the sample variance, in which you divide sum(x-xbar)^2 by (n-1)  [rather than n] because you are using an estimate of the location parameter.

I think the book by D'Agostino and Stephens, Goodness-Of-Fit Techniques, covers this.

GemmaR
SAS Employee

Hi,

Thanks Rick, so it's always better to input the true values if you are going to use the univariate and the goodness of fit tests?

In your opinion, are these statistics useful? I generally look at the normal probability plot and if in doubt the skewness and kurtosis to cement my decision, so I wouldn't particularly use the other tests.

Many thanks for your help and advice, it's really appreciated!

Rick_SAS
SAS Super FREQ

Well, yes, but usually you don't know the true values!  The true values are the population parameters; typically all we know are the estimates from a sample.

Are these statistics useful? I think so. A probability plot or a Q-Q plot is a useful first step in deciding whether your data might be successfully modeled by a certain distribution (see http://blogs.sas.com/content/iml/2011/10/28/modeling-the-distribution-of-data-create-a-qq-plot/). However, I view the Q-Q plot as complementary to statistical tests, not a replacement for them.

The "usefulness" also depends on what you are trying to do. Ask yourself why you are checking normality. Is it because you want to run some OTHER test (such as a t test) that assumes normality? If so, how robust is the other test to deviations in normality? If it is robust, then from a practical viewpoint it probably doesn't matter whether the p-value in the test of normality is 0.06 or 0.04. Also, modern statistics has a lot of nonparametric methods that do not require assumptions about the distribution of the data.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1547 views
  • 3 likes
  • 2 in conversation