09-23-2014 03:24 PM
Standard logistic regression works for predict binary 0 & 1. Now I have a data set with predict value between 0 and 1, such as 0.2, 0.5, .... How can use logistic regression build a model out of it? thanks.
09-23-2014 03:50 PM
For example, I have a sample, which contains 100 pool information. there are some independent variables which are pool average characteristics, such as loan size, loan coupon. I have a predictor variable, which is a pool average of 0 and 1, so they 0.115, 0.234, .... I want to build a logistic regression model to predict (0.115, 0.234, ...) from pool average characteristics, such as pool average loan size etc. If the predictor are 90, 0, 1, 0, 1, 1, ..). Then it's easy to do. But for continuous variable, I can not use proc logistic to do it. Is there a way out? Thanks.
09-23-2014 07:42 PM
1. You can take the logarithm of variables, and create a model using those new variables.
2. If you want to model a proportion, maybe you could use PROC GENMOD with DIST=GAMMA
Do I understand correctly: individual data is not available, only aggregated?
09-24-2014 07:27 AM
1. logarithm is not a linear transformation, so optimal fit for logarithm of variables are not optimal fit for the variables themselves. I have proved this.
2. Can you explain more how to use DIST=GAMMA? Normally I use d=b.
09-24-2014 08:06 AM
1. You are right. But it also depends how you define "optimal", what are your model assumtions, and how you transform "back" your predicted variable (if you transformed the target variable)
2. Sorry, it is not GAMMA it is BETA. But beta distribution is not available in GENMOD.
So what about his?
proc glimmix data=input;
model target=x1 / d=beta;
With GLIMMIX you might find the appropriate model you need (with, or without variable transformations).
09-24-2014 09:52 AM
When you transform predictions using exp(log_target) you will get the unbiased prediction of the median on the original scale.
To get an unbiased prediction of the mean on the original scale use: exp(log_target+0.5*std**2)
std is the prediction standard deviation.
This is true if your log transformed variable is (conditionally) normally disributed (the original variable is lognormal).
09-24-2014 10:48 AM
Unbiased estimation of the median.
On the log scale the prediction of the mean is the prediction of the median at the same time. (Remember, we have a normally distributed variable (on the log scale), which is a symetric distribution.)
Now, if you apply a monotonic function (like exp()) to the prediction... the median still sits in the middle of the distribution
I don't know what happens to the geometric mean.
09-24-2014 10:23 AM
Cool. How can I get prediction standard deviation? From the model fit, I got the std of the fitted parameter, then how obtain predictor std base on this? thanks.