Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
laurenhosking
Quartz | Level 8
I’m performing a Multivariate regression and my residuals are not normally distributed. I think I need to perform a log y transformation but when doing this my residuals still aren’t normally distributed? Any tips on where I’ve gone wrong
4 REPLIES 4
Rick_SAS
SAS Super FREQ

That's a huge topic, but three common reasons are

Just FYI, you only need normality if you intend to use inferential statistics. The predicted values are valid regardless.

PaigeMiller
Diamond | Level 26

A fourth explanation for non-normal residuals is that the assumption of the errors being normally distributed is just plain wrong in this data.

--
Paige Miller
ballardw
Super User

It never hurts to show the regression procedure code that you used.

 

That may give the folks like @PaigeMiller or @Rick_SAS some additional clues to look at. And maybe include some of the model diagnostics/summaries like numbers of observations and such.

SteveDenham
Jade | Level 19

@ballardw  makes a great point.  If you are doing some sort of testing for normality, be aware that for large datasets even a minor deviation from normality will be found to be significant, and for smaller datasets, single points may lead to significance.  Remember that linear models are remarkably robust to the assumption of the normality of residuals.  Consequently, if you must do testing, set your alpha at a smaller than usual level, say 0.001.  Better to follow @Rick_SAS 's lead and examine plots of the residuals.

 

SteveDenham

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2243 views
  • 0 likes
  • 5 in conversation