Programming the statistical procedures from SAS

Non normal distribution in regression

Reply
Super Contributor
Posts: 399

Non normal distribution in regression

May I request someone to shed some light on stattistical test to be conducted when erros in regression don't follow the normal distribution?

SAS Super FREQ
Posts: 3,304

Re: Non normal distribution in regression

Look at the documentation for the GENMOD procedure, which includes sections about Goodness-of-Fit tests and related statistics. The doc for PROC GENMOD also explain estimates and contrasts.

 

If you provide more information about your model, more can be said.

Super Contributor
Posts: 399

Re: Non normal distribution in regression

Thank you for your response Rick.

 

It is a general question which I came across , hence anticipating the simple answer in layman's term rather than bookish languague.

SAS Super FREQ
Posts: 3,304

Re: Non normal distribution in regression

You might enjoy this graphical comparison of the assumptions for error distributions in linear and nonlinear models:

 

Super Contributor
Posts: 399

Re: Non normal distribution in regression

Thanks again Rick.

 

So can I assume that answer to my question is 'proc genmod'?

 

I've also an another novice question - I know to find whether data is following normal distribution or not , but I don't know how to find whether error is following normal distribution or not. May I request you to guide me on this?

SAS Super FREQ
Posts: 3,304

Re: Non normal distribution in regression

PROC GENMOD is a good place to start for fitting models of this type.  There are alternatives, especially if you think the errors are correlated (as in a time series), but I don't want to overwhelm you with too many options. 

 

I recommend that you do an internet search for "SAS" and "regression diagnostics" or "diagnostic plots".  This is a deep area that is worth learning about.

 

A (very) short answer is that when the response variable is contnuous, you can examine the error distribution by fitting a model and then plotting the distribution of the raw residuals. Most SAS procedures, including GENMOD, have an OUTPUT statement that enables you to write the residual values to a data set. The simplest plot is a histogram of the residuals.  Does the histogram look approximately "bell shaped"? 

 

You can also plot the raw residuals versus each of the explanatory variables.  If any of the plots look "fan shaped" (the size of the residuals depend on an X), that indicates that the model is not capturing the variation in the data.  If so, many practitioners try to fit a more sophisticated model.

 

Grand Advisor
Posts: 16,850

Re: Non normal distribution in regression

Error is data...you should be able to isolate your error terms if you run a regression model.


Babloo wrote:

I know to find whether data is following normal distribution or not , but I don't know how to find whether error is following normal distribution or not. 




Grand Advisor
Posts: 9,444

Re: Non normal distribution in regression

[ Edited ]

Just a thought.

You can use proc univariate to check X and Y to see whether they are all normal distribution or not . If they were all conform to normal then you can say residual term is normal distribution.

Frequent Contributor
Posts: 140

Re: Non normal distribution in regression

If the dependent variable is continuous but the assumptions of OLS regression are not met regarding normality of residuals, then I suggest PROC ROBUSTREG and PROC QUANTREG both of which relax those assumptions.  I've written papers on these for last years SGF.

Super Contributor
Posts: 399

Re: Non normal distribution in regression

Are you saying that if my data follows a normal distribution then error in the data will also follow a normal distribution? If not, may I request you to write a simple SAS code to demonstarte normal distribution for errors?

Grand Advisor
Posts: 9,444

Re: Non normal distribution in regression

Yes. According to Statistical Theory , any linear combination of normal variables is also normal distribution. Therefore,

Y-X= epsilon  , if Y and X all conform to normal distribution then epsilon also conform normal.

Otherwise, you could use other Robust Regression Method as other suggest . 

This is just my two cents.

Frequent Contributor
Posts: 140

Re: Non normal distribution in regression

But the converse isn't true.  That is, you can have Y be non-normal and still have normal residual

Valued Guide
Valued Guide
Posts: 673

Re: Non normal distribution in regression

Look at it this way. If Y is dependent (conditional) on X, then it is irrelevant to test whether Y is normally distriubuted (independent of X). That is, using proc univariate to assess normality of Y is meaningless. You want to check the normality of the residuals, or better, the normality of the studentized residuals. This is automatically done in graphic form by several procedures.

Grand Advisor
Posts: 16,850

Re: Non normal distribution in regression

Look at the Fit Diagnostics panel from Proc Reg. I think it's produced by default these days. 

These charts help assess normality of the Residuals. You could also extract the residuals from proc reg and pass them to proc NPAR1WAY which has a bunch of tests for normality.

 

FitDiag Deleete.PNG

Grand Advisor
Posts: 16,850

Re: Non normal distribution in regression

But OLS doesn't have an assumption that the X and Y are normally distributed, only the errors. More an assumption that they're random rather than systematic.

Ask a Question
Discussion stats
  • 16 replies
  • 508 views
  • 3 likes
  • 6 in conversation