11-27-2013 02:50 PM
I am using SAS Enterprise guide version 6.100 (126.96.36.19970) (64-bit) ODA.
I am analyzing which variables influence the length of stay in hospital.
The dependant variable is DaysOfStay. This variable does not have normal distribution.
Is there a way in SAS Enterprose Guide I could normalize the distribution?
11-27-2013 02:55 PM
Why do you need the variable to have a normal distribution to determine if other varaibles influence it? Two variables can have any distribution and still have influence on eachother.
Can you give further explanation?
11-27-2013 03:18 PM
The assumption of normality for regression is for the errors, not the variables, though the assumption of normality matters for other tests.
1. You can standardize a variable to get a normal distn.
2. You can use non-parametric methods if your data doesn't meet the assumptions (e.g Normality)
3. I would also look at a histogram of the data to determine normality, not just stats, you may have an outlier problem you want to deal with.
11-29-2013 05:53 AM
Thank you Reeza for the reply.
I do have to achieve normality for other tests.
Could you advise how to standardize data in order to get a normal distribution?
And which non- parametric methods I could use?
11-29-2013 08:41 AM
You can use Box-Cox transformation using PROC TRANSREG in SAS to achieve normality.But by the summary statistics "log" may be a good transformation for your data. But one of the main problems with transformations are in the interpretations.
Non - parametric methods will also be useful with lower power.
But you can fit the model with Generalized Linear Models (GLM). The general method for modeling the length of stay in hospital has often GLM approach.
check the following paper to know more about modeling length of stay:
The full PDF of the above file is available by searching in Google. The SAS EG has the very good menu for GLM.
01-27-2014 04:13 PM
As I mentioned earlier, the variable LOS in positively skewed,
I was trying to solve the problem by applying log transformations. The reason why I need to normalize the variable is to meet assumptions of multiple linear regression...
I run the following code:
lny = log(DaysOfStay); /* The natural logarithm (base e) */
but still new variable seems to be skewed :
I wonder if the code I run is correct?!