Rhodochrosite | Level 12

## Non normal distribution in regression

May I request someone to shed some light on stattistical test to be conducted when erros in regression don't follow the normal distribution?

16 REPLIES 16
SAS Super FREQ

## Re: Non normal distribution in regression

Look at the documentation for the GENMOD procedure, which includes sections about Goodness-of-Fit tests and related statistics. The doc for PROC GENMOD also explain estimates and contrasts.

If you provide more information about your model, more can be said.

Rhodochrosite | Level 12

## Re: Non normal distribution in regression

Thank you for your response Rick.

It is a general question which I came across , hence anticipating the simple answer in layman's term rather than bookish languague.

SAS Super FREQ

## Re: Non normal distribution in regression

You might enjoy this graphical comparison of the assumptions for error distributions in linear and nonlinear models:

Rhodochrosite | Level 12

## Re: Non normal distribution in regression

Thanks again Rick.

So can I assume that answer to my question is 'proc genmod'?

I've also an another novice question - I know to find whether data is following normal distribution or not , but I don't know how to find whether error is following normal distribution or not. May I request you to guide me on this?

SAS Super FREQ

## Re: Non normal distribution in regression

PROC GENMOD is a good place to start for fitting models of this type.  There are alternatives, especially if you think the errors are correlated (as in a time series), but I don't want to overwhelm you with too many options.

I recommend that you do an internet search for "SAS" and "regression diagnostics" or "diagnostic plots".  This is a deep area that is worth learning about.

A (very) short answer is that when the response variable is contnuous, you can examine the error distribution by fitting a model and then plotting the distribution of the raw residuals. Most SAS procedures, including GENMOD, have an OUTPUT statement that enables you to write the residual values to a data set. The simplest plot is a histogram of the residuals.  Does the histogram look approximately "bell shaped"?

You can also plot the raw residuals versus each of the explanatory variables.  If any of the plots look "fan shaped" (the size of the residuals depend on an X), that indicates that the model is not capturing the variation in the data.  If so, many practitioners try to fit a more sophisticated model.

Super User

## Re: Non normal distribution in regression

Error is data...you should be able to isolate your error terms if you run a regression model.

@Babloo wrote:

I know to find whether data is following normal distribution or not , but I don't know how to find whether error is following normal distribution or not.

Super User

## Re: Non normal distribution in regression

Just a thought.

You can use proc univariate to check X and Y to see whether they are all normal distribution or not . If they were all conform to normal then you can say residual term is normal distribution.

Lapis Lazuli | Level 10

## Re: Non normal distribution in regression

If the dependent variable is continuous but the assumptions of OLS regression are not met regarding normality of residuals, then I suggest PROC ROBUSTREG and PROC QUANTREG both of which relax those assumptions.  I've written papers on these for last years SGF.

Rhodochrosite | Level 12

## Re: Non normal distribution in regression

Are you saying that if my data follows a normal distribution then error in the data will also follow a normal distribution? If not, may I request you to write a simple SAS code to demonstarte normal distribution for errors?

Super User

## Re: Non normal distribution in regression

Yes. According to Statistical Theory , any linear combination of normal variables is also normal distribution. Therefore,

Y-X= epsilon  , if Y and X all conform to normal distribution then epsilon also conform normal.

Otherwise, you could use other Robust Regression Method as other suggest .

This is just my two cents.

Lapis Lazuli | Level 10

## Re: Non normal distribution in regression

But the converse isn't true.  That is, you can have Y be non-normal and still have normal residual

Rhodochrosite | Level 12

## Re: Non normal distribution in regression

Look at it this way. If Y is dependent (conditional) on X, then it is irrelevant to test whether Y is normally distriubuted (independent of X). That is, using proc univariate to assess normality of Y is meaningless. You want to check the normality of the residuals, or better, the normality of the studentized residuals. This is automatically done in graphic form by several procedures.

Super User

## Re: Non normal distribution in regression

Look at the Fit Diagnostics panel from Proc Reg. I think it's produced by default these days.

These charts help assess normality of the Residuals. You could also extract the residuals from proc reg and pass them to proc NPAR1WAY which has a bunch of tests for normality.

Super User

## Re: Non normal distribution in regression

But OLS doesn't have an assumption that the X and Y are normally distributed, only the errors. More an assumption that they're random rather than systematic.

Discussion stats
• 16 replies
• 4897 views
• 3 likes
• 6 in conversation