Desktop productivity for business analysts and programmers

Solution for Non-normally distributed data

Reply
Occasional Contributor
Posts: 10

Solution for Non-normally distributed data

I am using SAS Enterprise guide version 6.100 (6.100.0.2870) (64-bit) ODA.

I am analyzing which variables influence the length of stay in hospital.

The dependant variable is DaysOfStay.  This variable does not have normal distribution.

Is there a way in SAS Enterprose Guide I could normalize the distribution?

Mean

55.80348

Median

42.50000

Mode

15.00000

Skewness

Kurtosis

4.529  

27.100

Thanks

Best Regards

Agate

Super Contributor
Posts: 418

Re: Solution for Non-normally distributed data

Why do you need the variable to have a normal distribution to determine if other varaibles influence it? Two variables can have any distribution and still have influence on eachother.

Can you give further explanation?

Grand Advisor
Posts: 17,313

Re: Solution for Non-normally distributed data

The assumption of normality for regression is for the errors, not the variables, though the assumption of normality matters for other tests.

1. You can standardize a variable to get a normal distn.

2. You can use non-parametric methods if your data doesn't meet the assumptions (e.g Normality)

3. I would also look at a histogram of the data to determine normality, not just stats, you may have an outlier problem you want to deal with.

Testing the assumptions of linear regression

Occasional Contributor
Posts: 10

Re: Solution for Non-normally distributed data

Thank you Reeza for the reply.

I do have to achieve normality for other tests.

Could you advise how to standardize data in order to get a normal distribution?

And which non- parametric methods I could use?

Thanks

Kind Regards

Agate

Contributor
Posts: 62

Re: Solution for Non-normally distributed data

Hi,

You can use Box-Cox transformation using PROC TRANSREG in SAS to achieve normality.But by the summary statistics "log" may be a good transformation for your data. But one of the main problems with transformations are in the interpretations.

Non - parametric methods will also be useful with lower power.

But you can fit the model with Generalized Linear Models (GLM). The general method for modeling the length of stay in hospital has often GLM approach.

check the following paper to know more about modeling length of stay:

http://www.ncbi.nlm.nih.gov/pubmed/9630132

The full PDF of the above file is available by searching in Google. The SAS EG has the very good menu for GLM.

Occasional Contributor
Posts: 10

Re: Solution for Non-normally distributed data

Hello,

As I mentioned earlier, the variable LOS in positively skewed,

I was trying to solve the problem by applying log transformations. The reason why I need to normalize the variable is to meet assumptions of multiple linear regression...

I run the following code:

data FYP.LOS_OUTLIERS_LOG;

SET FYP.LOS_OUTLIERS_LOG;

lny    = log(DaysOfStay);     /* The natural logarithm (base e) */

run;

but still new variable seems to be skewed :

I wonder if the code I run is correct?!

Thanks

Best Regards

Agate

Grand Advisor
Posts: 17,313

Re: Solution for Non-normally distributed data

Testing the assumptions of linear regression

Which one of the 4 assumptions are you violating?

Occasional Contributor
Posts: 10

Re: Solution for Non-normally distributed data

Hello,

I am violating normal distribution of errors assumption

Ask a Question
Discussion stats
  • 7 replies
  • 2395 views
  • 2 likes
  • 4 in conversation