BookmarkSubscribeRSS Feed
Babloo
Rhodochrosite | Level 12

May I request someone to shed some light on stattistical test to be conducted when erros in regression don't follow the normal distribution?

16 REPLIES 16
Rick_SAS
SAS Super FREQ

Look at the documentation for the GENMOD procedure, which includes sections about Goodness-of-Fit tests and related statistics. The doc for PROC GENMOD also explain estimates and contrasts.

 

If you provide more information about your model, more can be said.

Babloo
Rhodochrosite | Level 12

Thank you for your response Rick.

 

It is a general question which I came across , hence anticipating the simple answer in layman's term rather than bookish languague.

Rick_SAS
SAS Super FREQ

You might enjoy this graphical comparison of the assumptions for error distributions in linear and nonlinear models:

 

Babloo
Rhodochrosite | Level 12

Thanks again Rick.

 

So can I assume that answer to my question is 'proc genmod'?

 

I've also an another novice question - I know to find whether data is following normal distribution or not , but I don't know how to find whether error is following normal distribution or not. May I request you to guide me on this?

Rick_SAS
SAS Super FREQ

PROC GENMOD is a good place to start for fitting models of this type.  There are alternatives, especially if you think the errors are correlated (as in a time series), but I don't want to overwhelm you with too many options. 

 

I recommend that you do an internet search for "SAS" and "regression diagnostics" or "diagnostic plots".  This is a deep area that is worth learning about.

 

A (very) short answer is that when the response variable is contnuous, you can examine the error distribution by fitting a model and then plotting the distribution of the raw residuals. Most SAS procedures, including GENMOD, have an OUTPUT statement that enables you to write the residual values to a data set. The simplest plot is a histogram of the residuals.  Does the histogram look approximately "bell shaped"? 

 

You can also plot the raw residuals versus each of the explanatory variables.  If any of the plots look "fan shaped" (the size of the residuals depend on an X), that indicates that the model is not capturing the variation in the data.  If so, many practitioners try to fit a more sophisticated model.

 

Reeza
Super User

Error is data...you should be able to isolate your error terms if you run a regression model.


@Babloo wrote:

I know to find whether data is following normal distribution or not , but I don't know how to find whether error is following normal distribution or not. 




Ksharp
Super User

Just a thought.

You can use proc univariate to check X and Y to see whether they are all normal distribution or not . If they were all conform to normal then you can say residual term is normal distribution.

plf515
Lapis Lazuli | Level 10

If the dependent variable is continuous but the assumptions of OLS regression are not met regarding normality of residuals, then I suggest PROC ROBUSTREG and PROC QUANTREG both of which relax those assumptions.  I've written papers on these for last years SGF.

Babloo
Rhodochrosite | Level 12

Are you saying that if my data follows a normal distribution then error in the data will also follow a normal distribution? If not, may I request you to write a simple SAS code to demonstarte normal distribution for errors?

Ksharp
Super User

Yes. According to Statistical Theory , any linear combination of normal variables is also normal distribution. Therefore,

Y-X= epsilon  , if Y and X all conform to normal distribution then epsilon also conform normal.

Otherwise, you could use other Robust Regression Method as other suggest . 

This is just my two cents.

plf515
Lapis Lazuli | Level 10

But the converse isn't true.  That is, you can have Y be non-normal and still have normal residual

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Look at it this way. If Y is dependent (conditional) on X, then it is irrelevant to test whether Y is normally distriubuted (independent of X). That is, using proc univariate to assess normality of Y is meaningless. You want to check the normality of the residuals, or better, the normality of the studentized residuals. This is automatically done in graphic form by several procedures.

Reeza
Super User

Look at the Fit Diagnostics panel from Proc Reg. I think it's produced by default these days. 

These charts help assess normality of the Residuals. You could also extract the residuals from proc reg and pass them to proc NPAR1WAY which has a bunch of tests for normality.

 

FitDiag Deleete.PNG

Reeza
Super User

But OLS doesn't have an assumption that the X and Y are normally distributed, only the errors. More an assumption that they're random rather than systematic.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 16 replies
  • 5947 views
  • 3 likes
  • 6 in conversation