BookmarkSubscribeRSS Feed
BlueNose
Quartz | Level 8

Dear all,

 

I wish to model a dependent variable Y, continuous, vs. a categorical independent variable X, along some covariates. I want to produce lsmeans and test for differences between all pairs of categories of X.

 

Y doesn't follow the normality assumption. Some values are 0 so I can't use the log transformation. I used the square root instead. It has improved things a bit, but it is still not normally distributed.

 

Is there a way in PROC GLM to get something like robust standard errors ? Is there an alternative, a procedure that is robust and produce clean lsmeans with tests ?

 

Thank you !

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Y doesn't have to follow a normal distribution to meet the assumpitons of PROC GLM. It is the errors in Y that have to follow a normal distribution in order to meet the assumptions of PROC GLM. Does that condition hold for your data?

--
Paige Miller
Ksharp
Super User

I not sure. You could check other proc like PROC ROBUSTREG .

BlueNose
Quartz | Level 8

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ? It is easier to check Y.

 

Does PROC ROBUSTREG support something like LSMEANS ?

PaigeMiller
Diamond | Level 26

@BlueNose wrote:

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ?


No, it does not mean that.

 

It is easier to check Y.

 

Yes, its easier, but doesn't provide useful information with respect to the issue of using PROC GLM.

 

By the way, the hypothesis tests in PROC GLM require the errors to be normally distributed (which is an assumption can make about your data, or not make, depending on your understanding of the problem), but you can run GLM and compute LSMEANS even if that assumption is not correct, as that assumption only affects the hypothesis tests.

--
Paige Miller
Ksharp
Super User

Normal distribution is usually said for residual , not Y, not X either.

@Rick_SAS wrote a blog about it recently, Maybe he could give you a guidance .

Rick_SAS
SAS Super FREQ

As PaigeMiller has said, you are mis-remembering the assumption. It is the errors that need to be normally distributed.  If the model is correctly specified, you can use the diagnostic plots such as histograms and Q-Q plots to assess the normality of the residuals. For details and an example, see "On the assumptions (and misconceptions) of linear regression.": The article shows an example of a response variable that is not normally distributed, yet satisfies the assumptions for GLM.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 4540 views
  • 5 likes
  • 4 in conversation