Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: GLM with data that doesn't follow the normality assumption

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 09-08-2018 04:59 AM
(5007 views)

Dear all,

I wish to model a dependent variable Y, continuous, vs. a categorical independent variable X, along some covariates. I want to produce lsmeans and test for differences between all pairs of categories of X.

Y doesn't follow the normality assumption. Some values are 0 so I can't use the log transformation. I used the square root instead. It has improved things a bit, but it is still not normally distributed.

Is there a way in PROC GLM to get something like robust standard errors ? Is there an alternative, a procedure that is robust and produce clean lsmeans with tests ?

Thank you !

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Y doesn't have to follow a normal distribution to meet the assumpitons of PROC GLM. It is the errors in Y that have to follow a normal distribution in order to meet the assumptions of PROC GLM. Does that condition hold for your data?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I not sure. You could check other proc like PROC ROBUSTREG .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ? It is easier to check Y.

Does PROC ROBUSTREG support something like LSMEANS ?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@BlueNose wrote:

If the errors follow the normal distribution, doesn't it mean that Y will also be normal ?

No, it does not mean that.

It is easier to check Y.

Yes, its easier, but doesn't provide useful information with respect to the issue of using PROC GLM.

By the way, the hypothesis tests in PROC GLM require the errors to be normally distributed (which is an assumption can make about your data, or not make, depending on your understanding of the problem), but you can run GLM and compute LSMEANS even if that assumption is not correct, as that assumption only affects the hypothesis tests.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Normal distribution is usually said for residual , not Y, not X either.

@Rick_SAS wrote a blog about it recently, Maybe he could give you a guidance .

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.