I’m performing a Multivariate regression and my residuals are not normally distributed. I think I need to perform a log y transformation but when doing this my residuals still aren’t normally distributed? Any tips on where I’ve gone wrong

That's a huge topic, but three common reasons are

Just FYI, you only need normality if you intend to use inferential statistics. The predicted values are valid regardless.

A fourth explanation for non-normal residuals is that the assumption of the errors being normally distributed is just plain wrong in this data.

It never hurts to show the regression procedure code that you used.


That may give the folks like @PaigeMiller or @Rick_SAS some additional clues to look at. And maybe include some of the model diagnostics/summaries like numbers of observations and such.

@ballardw  makes a great point.  If you are doing some sort of testing for normality, be aware that for large datasets even a minor deviation from normality will be found to be significant, and for smaller datasets, single points may lead to significance.  Remember that linear models are remarkably robust to the assumption of the normality of residuals.  Consequently, if you must do testing, set your alpha at a smaller than usual level, say 0.001.  Better to follow @Rick_SAS 's lead and examine plots of the residuals.



