Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
BlueNose
Quartz | Level 8

Dear all,

 

I wish to model a dependent variable Y, continuous, vs. a categorical independent variable X, along some covariates. I want to produce lsmeans and test for differences between all pairs of categories of X.

 

Y doesn't follow the normality assumption. Some values are 0 so I can't use the log transformation. I used the square root instead. It has improved things a bit, but it is still not normally distributed.

 

Is there a way in PROC GLM to get something like robust standard errors ? Is there an alternative, a procedure that is robust and produce clean lsmeans with tests ?

 

Thank you !

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Y doesn't have to follow a normal distribution to meet the assumpitons of PROC GLM. It is the errors in Y that have to follow a normal distribution in order to meet the assumptions of PROC GLM. Does that condition hold for your data?

--
Paige Miller
Ksharp
Super User

I not sure. You could check other proc like PROC ROBUSTREG .

BlueNose
Quartz | Level 8

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ? It is easier to check Y.

 

Does PROC ROBUSTREG support something like LSMEANS ?

PaigeMiller
Diamond | Level 26

@BlueNose wrote:

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ?


No, it does not mean that.

 

It is easier to check Y.

 

Yes, its easier, but doesn't provide useful information with respect to the issue of using PROC GLM.

 

By the way, the hypothesis tests in PROC GLM require the errors to be normally distributed (which is an assumption can make about your data, or not make, depending on your understanding of the problem), but you can run GLM and compute LSMEANS even if that assumption is not correct, as that assumption only affects the hypothesis tests.

--
Paige Miller
Ksharp
Super User

Normal distribution is usually said for residual , not Y, not X either.

@Rick_SAS wrote a blog about it recently, Maybe he could give you a guidance .

Rick_SAS
SAS Super FREQ

As PaigeMiller has said, you are mis-remembering the assumption. It is the errors that need to be normally distributed.  If the model is correctly specified, you can use the diagnostic plots such as histograms and Q-Q plots to assess the normality of the residuals. For details and an example, see "On the assumptions (and misconceptions) of linear regression.": The article shows an example of a response variable that is not normally distributed, yet satisfies the assumptions for GLM.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 5579 views
  • 5 likes
  • 4 in conversation