BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cmcg1
Calcite | Level 5

Hello,

Is there anyway to test for normality of just the random effects on Proc Mixed other than the graphs, such as a shapiro-wilk test? i.e is there anyway to get a figure for normality?

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

My first question is to ask "Why do this?".  All of the estimation methods in linear mixed models are based on the assumption that  the variances (random effects) have a mean of zero, a positive variance, and perhaps some sort of covariance with the other random effects.  So why test? And especially why test with any of the readily available tests for normality which are overpowered for larger sample sizes and underpowered for small sample sizes.  Over 50 years ago, George Box said something like, "To test for normality before analysing the data is akin to going out in a row boat to see if the ocean is safe for an ocean liner."  In particular, suppose you found that the variance component in question failed a normality test?  Without some graphical aid to identify what the difference was attributable to, you will be left without a method for analysis - If you can identify what is causing the deviation, you could maybe add an additional factor or grouping to avoid the issue.

 

So if you truly wanted to do something in this area, you will need to get the blups for every record.  You can do this with an OUTPUT statement in GLIMMIX without too much difficulty.- get the default linear predictor and subtract the marginal linear predictor.  It is a bit different in MIXED - specify the OUTP= option in the MODEL statement, and calculate the difference between the marginal raw residual and the conditional raw residual.

 

Now you have some aggregated variability estimate - all of the random effects including the residual error.  You can partition it based on the relative size of the variance components (sort of like an intraclass correlation, but not quite), but that is based on assuming that the variances are additive, which in turn depends on the assumption of independence and identical scaled distributions.

 

So it is difficult, fraught with pitfalls, and liable to be misleading to "test" for normality of variance components.  And I haven't even touched on the issue of "what is the expected distribution of a variance estimator?" (ans. by Cochran's theorem it is a scaled chi-squared distribution, not a normal distribution).

 

SteveDenham

View solution in original post

2 REPLIES 2
sbxkoenk
SAS Super FREQ

@SteveDenham might be able to help you out !

 

Thanks,

Koen

SteveDenham
Jade | Level 19

My first question is to ask "Why do this?".  All of the estimation methods in linear mixed models are based on the assumption that  the variances (random effects) have a mean of zero, a positive variance, and perhaps some sort of covariance with the other random effects.  So why test? And especially why test with any of the readily available tests for normality which are overpowered for larger sample sizes and underpowered for small sample sizes.  Over 50 years ago, George Box said something like, "To test for normality before analysing the data is akin to going out in a row boat to see if the ocean is safe for an ocean liner."  In particular, suppose you found that the variance component in question failed a normality test?  Without some graphical aid to identify what the difference was attributable to, you will be left without a method for analysis - If you can identify what is causing the deviation, you could maybe add an additional factor or grouping to avoid the issue.

 

So if you truly wanted to do something in this area, you will need to get the blups for every record.  You can do this with an OUTPUT statement in GLIMMIX without too much difficulty.- get the default linear predictor and subtract the marginal linear predictor.  It is a bit different in MIXED - specify the OUTP= option in the MODEL statement, and calculate the difference between the marginal raw residual and the conditional raw residual.

 

Now you have some aggregated variability estimate - all of the random effects including the residual error.  You can partition it based on the relative size of the variance components (sort of like an intraclass correlation, but not quite), but that is based on assuming that the variances are additive, which in turn depends on the assumption of independence and identical scaled distributions.

 

So it is difficult, fraught with pitfalls, and liable to be misleading to "test" for normality of variance components.  And I haven't even touched on the issue of "what is the expected distribution of a variance estimator?" (ans. by Cochran's theorem it is a scaled chi-squared distribution, not a normal distribution).

 

SteveDenham

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 774 views
  • 6 likes
  • 3 in conversation