Quartz | Level 8

## GLM with data that doesn't follow the normality assumption

Dear all,

I wish to model a dependent variable Y, continuous, vs. a categorical independent variable X, along some covariates. I want to produce lsmeans and test for differences between all pairs of categories of X.

Y doesn't follow the normality assumption. Some values are 0 so I can't use the log transformation. I used the square root instead. It has improved things a bit, but it is still not normally distributed.

Is there a way in PROC GLM to get something like robust standard errors ? Is there an alternative, a procedure that is robust and produce clean lsmeans with tests ?

Thank you !

6 REPLIES 6
Diamond | Level 26

## Re: GLM with data that doesn't follow the normality assumption

Y doesn't have to follow a normal distribution to meet the assumpitons of PROC GLM. It is the errors in Y that have to follow a normal distribution in order to meet the assumptions of PROC GLM. Does that condition hold for your data?

--
Paige Miller
Super User

## Re: GLM with data that doesn't follow the normality assumption

I not sure. You could check other proc like PROC ROBUSTREG .

Quartz | Level 8

## Re: GLM with data that doesn't follow the normality assumption

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ? It is easier to check Y.

Does PROC ROBUSTREG support something like LSMEANS ?

Diamond | Level 26

## Re: GLM with data that doesn't follow the normality assumption

@BlueNose wrote:

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ?

No, it does not mean that.

It is easier to check Y.

Yes, its easier, but doesn't provide useful information with respect to the issue of using PROC GLM.

By the way, the hypothesis tests in PROC GLM require the errors to be normally distributed (which is an assumption can make about your data, or not make, depending on your understanding of the problem), but you can run GLM and compute LSMEANS even if that assumption is not correct, as that assumption only affects the hypothesis tests.

--
Paige Miller
Super User

## Re: GLM with data that doesn't follow the normality assumption

Normal distribution is usually said for residual , not Y, not X either.

@Rick_SAS wrote a blog about it recently, Maybe he could give you a guidance .

SAS Super FREQ

## Re: GLM with data that doesn't follow the normality assumption

As PaigeMiller has said, you are mis-remembering the assumption. It is the errors that need to be normally distributed.  If the model is correctly specified, you can use the diagnostic plots such as histograms and Q-Q plots to assess the normality of the residuals. For details and an example, see "On the assumptions (and misconceptions) of linear regression.": The article shows an example of a response variable that is not normally distributed, yet satisfies the assumptions for GLM.

Discussion stats
• 6 replies
• 5003 views
• 5 likes
• 4 in conversation