07-12-2011 10:51 AM
We are building a logistic model and are having issues with the probabilities being very small. Two different models with the same data (138,000 obs): Model 1 with 3 variables and 3 interactions (one variable to the 2nd, 3rd and 4th power) the intercept is -460 and the Hosmer-Lemeshow p-value is .0001, the range of the probabilities is .034 to .80: Model 2 with 16 variables (11 are a date used in the class statement) and 3 interactions (one variable to the 2nd, 3rd and 4th power), the intercept is -709 and the Hosmer-Lemeshow p-value is .5865 and the range of probabilities is 4.12E-67 to 9.47E-17. We ran a correlation between the probabilities and the response, in Model 1 it is as you would expect, a positive correlation, in Model 2 the correlation is negative.
We have done many iterations of the two models and this is the best we can get. We would like to use Model 2 but are concerned with the probabilities being so low. Why are the probabilities so low, why are they negatively correlated with the response and what can we do to fix it?
07-12-2011 02:03 PM
Sounds like the second model is a total failure. The predicted probabilities are essentially all 0. There's been a few posting recently on modeling rare events by Ruth. Have a look at those to see if they give you some technical ideas.