Statistical Procedures

BlueNose · Posted 09-08-2018 04:59 AM

Dear all,

I wish to model a dependent variable Y, continuous, vs. a categorical independent variable X, along some covariates. I want to produce lsmeans and test for differences between all pairs of categories of X.

Y doesn't follow the normality assumption. Some values are 0 so I can't use the log transformation. I used the square root instead. It has improved things a bit, but it is still not normally distributed.

Is there a way in PROC GLM to get something like robust standard errors ? Is there an alternative, a procedure that is robust and produce clean lsmeans with tests ?

Thank you !

PaigeMiller · Posted 09-08-2018 06:29 AM

Y doesn't have to follow a normal distribution to meet the assumpitons of PROC GLM. It is the errors in Y that have to follow a normal distribution in order to meet the assumptions of PROC GLM. Does that condition hold for your data?

--
Paige Miller

Ksharp · Posted 09-08-2018 07:05 AM

I not sure. You could check other proc like PROC ROBUSTREG .

BlueNose · Posted 09-08-2018 12:02 PM

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ? It is easier to check Y.

Does PROC ROBUSTREG support something like LSMEANS ?

PaigeMiller · Posted 09-08-2018 12:08 PM

@BlueNose wrote:

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ?

No, it does not mean that.

It is easier to check Y.

Yes, its easier, but doesn't provide useful information with respect to the issue of using PROC GLM.

By the way, the hypothesis tests in PROC GLM require the errors to be normally distributed (which is an assumption can make about your data, or not make, depending on your understanding of the problem), but you can run GLM and compute LSMEANS even if that assumption is not correct, as that assumption only affects the hypothesis tests.

--
Paige Miller

Ksharp · Posted 09-09-2018 06:02 AM

Normal distribution is usually said for residual , not Y, not X either.

@Rick_SAS wrote a blog about it recently, Maybe he could give you a guidance .

Rick_SAS · Posted 09-09-2018 06:32 AM

As PaigeMiller has said, you are mis-remembering the assumption. It is the errors that need to be normally distributed. If the model is correctly specified, you can use the diagnostic plots such as histograms and Q-Q plots to assess the normality of the residuals. For details and an example, see "On the assumptions (and misconceptions) of linear regression.": The article shows an example of a response variable that is not normally distributed, yet satisfies the assumptions for GLM.

Statistical Procedures

GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Follow Us

What is...

Statistical Procedures

GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Re: GLM with data that doesn't follow the normality assumption

Our biggest data and AI event of the year.

Follow Us

What is...