Hi all, I am trying to build a logistic regression model: using 7 variables (see below) to predict college enrollment (Enroll vs. Not-Enroll). When I put all seven variables in the model, the Hosmer and Lemeshow Goodness-of-Fit Test is significant, I think it suggests that the model does not fit the data well. I then tried variable selection and allowed two-way interactions to enter the selection, some of the interaction terms were selected for the final model, and the Hosmer and Lemeshow Goodness-of-Fit Test is still significant. I also tried variable selection without including interaction terms. Four out of the seven variables were selected for the final model (race, sat score, legacy and year) and the Hosmer and Lemeshow Goodness-of-Fit p-value is significant as well. Does any one have suggestions on what I should do next? A general question: Is the Hosmer and Lemeshow Goodness-of-Fit Test a good way to evaluate model fit? What do you usually use to evaluate model fit for logistic regression? Any suggestion is appreciated. Yanmin Gender (Female vs. Male) Race (Non-US vs. Minority vs. White) SAT scores (numerical variable) Academic Interest (Sciences vs. Interdisciplinary vs. Humanities vs. Undecided vs. Social Sciences) Legacy (Yes vs. No) First generation (Yes vs. No) Year(2012 vs. 2013 vs. 2014)
... View more