- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's a huge topic, but three common reasons are
- A misspecified model. The residuals show a systematic trend, such as a quadratic effect that might need to be included in the model.
- Heteroscedasticity often appears as a "fan-shaped" plot in which the size of the residuals tend to be small on one side of the plot and large on the other.
- Correlated errors show up as a sequence of consecutive high or low values, rather than a "random scatter" of points.
Just FYI, you only need normality if you intend to use inferential statistics. The predicted values are valid regardless.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A fourth explanation for non-normal residuals is that the assumption of the errors being normally distributed is just plain wrong in this data.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It never hurts to show the regression procedure code that you used.
That may give the folks like @PaigeMiller or @Rick_SAS some additional clues to look at. And maybe include some of the model diagnostics/summaries like numbers of observations and such.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ballardw makes a great point. If you are doing some sort of testing for normality, be aware that for large datasets even a minor deviation from normality will be found to be significant, and for smaller datasets, single points may lead to significance. Remember that linear models are remarkably robust to the assumption of the normality of residuals. Consequently, if you must do testing, set your alpha at a smaller than usual level, say 0.001. Better to follow @Rick_SAS 's lead and examine plots of the residuals.
SteveDenham