- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 11-12-2010 08:27 AM
(2229 views)
i am running a logistic regression for agent data. i have 15 variables and 5000 obs.
after data treatment steps, in logistic regresion i am getting 10 variables as significant with p > 0.001, where as intercept has value p value 0.3150.
what would be d reason for intercept being non significant.
Hosmer and Lemeshow Goodness-of-Fit Test is showing significance, so overall model is not a good fit.
please advice me
after data treatment steps, in logistic regresion i am getting 10 variables as significant with p > 0.001, where as intercept has value p value 0.3150.
what would be d reason for intercept being non significant.
Hosmer and Lemeshow Goodness-of-Fit Test is showing significance, so overall model is not a good fit.
please advice me
5 REPLIES 5
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Intercepts are not usually of interest for hypothesis testing. Is there any particular reason that you care whether the null hypothesis H0:Intercept=0 is rejected in favor of the alternative hypothesis HA:Intercept^=0?
Keep in mind, too, that when modeling a binary response using a logit link function (which is the default link function when you fit a logistic regression model), a zero value for the intercept would indicate that the probability of the response is 0.5 when all of the predictor variables are zero. So, a test of H0:Intercept=0 is testing whether the probability of the response is 0.5 (given zero values for all of the predictors).
Keep in mind, too, that when modeling a binary response using a logit link function (which is the default link function when you fit a logistic regression model), a zero value for the intercept would indicate that the probability of the response is 0.5 when all of the predictor variables are zero. So, a test of H0:Intercept=0 is testing whether the probability of the response is 0.5 (given zero values for all of the predictors).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you Dale,
estimate for intercept is 0.2970 and it is not significant, so i thought i will drop it, as i have not made any hypo for intercept.
i rerun the model without intercept using "noint" option in model statement in proc logistic., all 10 variables are significant, However Concordance is .992 which good but very high ;-( ie area under ROC is 99.2%
and Hosmer and Lemeshow Goodness-of-Fit Test has Chi sq value of 4982.9039 with p <.0001
how to get is insignificant
estimate for intercept is 0.2970 and it is not significant, so i thought i will drop it, as i have not made any hypo for intercept.
i rerun the model without intercept using "noint" option in model statement in proc logistic., all 10 variables are significant, However Concordance is .992 which good but very high ;-( ie area under ROC is 99.2%
and Hosmer and Lemeshow Goodness-of-Fit Test has Chi sq value of 4982.9039 with p <.0001
how to get is insignificant
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A couple of thoughts.
-- to follow-up on Dale's comment. Removing the intercept from the model says that you are "sure" that the true intercept is 0.5 . Unless there are theoretical reasons for doing that, I generally leave it in.
-- with an AUC of 99+%, I would worry about over-fitting the data. (A "good" AUC is somewhat dependent on the research discipline.)
-- with 5000 observations, most of the tests will be significant (including H&L). The p-value really doesn't tell you very much. You have to look at the results and determine if they are also meaningful.
-- to follow-up on Dale's comment. Removing the intercept from the model says that you are "sure" that the true intercept is 0.5 . Unless there are theoretical reasons for doing that, I generally leave it in.
-- with an AUC of 99+%, I would worry about over-fitting the data. (A "good" AUC is somewhat dependent on the research discipline.)
-- with 5000 observations, most of the tests will be significant (including H&L). The p-value really doesn't tell you very much. You have to look at the results and determine if they are also meaningful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Nonsignificance is NOT a good reason to drop an intercept from a model. Leave it in unless you have a good physical (mechanistic) reason why it must be 0 (on a logit scale, or 0.5 on a probability scale).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'd go farther than LVM and say that significance/nonsignificance isn't really a good reason for including or dropping ANY variable, although it is one that is often used. I've had bosses/clients insist on it. Effect size is more important. With a large N, as has been said, it's easy to get significance. And, as also been said, the intercept is usually not of interest, but that is a reason for leaving it IN, not taking it OUT.
With 5,000 cases and 15 IVs, unless the DV is very unevenly distributed, you have plenty of cases per variable. Why not include ALL the IVs? How did you choose them in the first place? Are they substantively important?
With 5,000 cases and 15 IVs, unless the DV is very unevenly distributed, you have plenty of cases per variable. Why not include ALL the IVs? How did you choose them in the first place? Are they substantively important?