BookmarkSubscribeRSS Feed
laurenhosking
Quartz | Level 8

I’ve done a one way ANOVA test however how would I check if my one-way ANOVA satisfies all the theoretical
assumptions which below? 

key assumptions are
• The samples must be independent.
• The populations or groups from which the samples were obtained must be normally distributed and all populations must have the same variance.
• In practice this last assumption can also be checked by testing normality of the residuals and homogeneity of the variances in the different group

13 REPLIES 13
unison
Lapis Lazuli | Level 10

No picture included.

-unison
laurenhosking
Quartz | Level 8

Thank you I edited it

Ksharp
Super User

 x ~ iid. N( mu , sigma)

 

proc glm plots=all

 

to get the all the pictures .

 

 

Calling  @Rick_SAS 

PaigeMiller
Diamond | Level 26

@laurenhosking wrote:

I’ve done a one way ANOVA test however how would I check if my one-way ANOVA satisfies all the theoretical
assumptions which below? 

key assumptions are
• The samples must be independent.


Generally, you would have to know (or assume) in advance that the samples are independent. I don't think its something that people normally test via analyzing the data. If you are conducting a study of random individuals, that's usually enough to assume they are independent. If the individuals are not random, for example, selecting from a single family, or are somehow blood relatives, you might conclude in advance that there might be some dependence between the individuals.

 

• The populations or groups from which the samples were obtained must be normally distributed and all populations must have the same variance.

 

As you have stated the requirement, this is not true. The errors from the fitted model must be normally distributed, not the raw data itself. This can be checked by examining the residuals. To test if each of the groups have the same variance, this can be done via the HOVTEST option of the MEANS statement in PROC GLM.

 

 

--
Paige Miller
laurenhosking
Quartz | Level 8

Thank you Do you think I can do a goodness to fit test like previous to check the second assumption 

PaigeMiller
Diamond | Level 26

The second assumption is called a "compound statement" because there are two parts, and its not clear which of the two parts you talking about.

 

The populations or groups from which the samples were obtained must be normally distributed and all populations must have the same variance.

 

Are you talking about the normal distribution part, or are you talking about the same variance part?

 

--
Paige Miller
laurenhosking
Quartz | Level 8
I believe the normally distributed part
PaigeMiller
Diamond | Level 26

The diagnostic plots you get from PROC GLM (if that's what you are using) include a histogram of the residuals and a Q-Q plot, both of which can be used to test for normality of the residuals.

--
Paige Miller
Rick_SAS
SAS Super FREQ

I think PaigeMiller has answered your questions. I will merely add that a one-way ANOVA is equivalent to a linear regression model with a single categorical regressor. As such, you might want to review the article "On the assumptions (and misconceptions) of linear regression." That article was written for a continuous regressor, but most of the ideas are the same regardless of whether the regressor is discrete or continuous, In particular, the regression diagnostic plots (discussed in the last section of the article) can provide graphical evidence that can help you decide whether the assumptions are reasonable for your data.

PaigeMiller
Diamond | Level 26

@Rick_SAS wrote:

I think PaigeMiller has answered your questions. I will merely add that a one-way ANOVA is equivalent to a linear regression model with a single categorical regressor. As such, you might want to review the article "On the assumptions (and misconceptions) of linear regression." That article was written for a continuous regressor, but most of the ideas are the same regardless of whether the regressor is discrete or continuous, In particular, the regression diagnostic plots (discussed in the last section of the article) can provide graphical evidence that can help you decide whether the assumptions are reasonable for your data.


I'm going to add a little more comment

 

Iif you just want to do an ANOVA, the only part of the ANOVA that depends on normality of the errors is the F-tests performed to see if the model terms are significantly different than zero; the rest of the computations do not depend on normality. And even if you have some non-normal distribution, sometimes the central limit theorem comes into play if you have enough data, and the means estimated by the ANOVA are approximately normally distributed anyway and so the F-tests are approximately correct.

 

As far as independence and correlated errors go (as mentioned by @Rick_SAS), the test he links to is for one type of correlation, specifically auto-correlation, or in other words correlation over time. There are types of correlation between the subjects in the study that are not correlation over time, but which have to be assumed and I don't think you can (easily) analyze for — the one I mentioned is a biological study where subjects are related to one another rather than randomly selected. Going back to the original statement in this thread "The samples must be independent", there is not a general test for lack of independence, although there is a test for auto-correlation.


All of this may be too much for the purposes of answering the original question.

--
Paige Miller
laurenhosking
Quartz | Level 8

so I did a goodness to fit and as mentioned some of my samples arent normally distributed and some are. Would I use the F-test then to see if in general they are normally disturbed? Or is there another way

@Rick_SAS 

PaigeMiller
Diamond | Level 26

@laurenhosking wrote:

so I did a goodness to fit and as mentioned some of my samples arent normally distributed and some are. Would I use the F-test then to see if in general they are normally disturbed? Or is there another way

@Rick_SAS 


The raw data does not have to be normally distributed. The errors from the fitted model have to be normally distributed. You test this by examining the histograms of the residuals and the Q-Q plot of the residuals. F-tests do not test to see if the data is normally distributed.

--
Paige Miller
laurenhosking
Quartz | Level 8

Thank you so much I just fingered what I did wrong! 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1143 views
  • 4 likes
  • 5 in conversation