turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Logistic regression

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-12-2010 08:27 AM

i am running a logistic regression for agent data. i have 15 variables and 5000 obs.

after data treatment steps, in logistic regresion i am getting 10 variables as significant with p > 0.001, where as intercept has value p value 0.3150.

what would be d reason for intercept being non significant.

Hosmer and Lemeshow Goodness-of-Fit Test is showing significance, so overall model is not a good fit.

please advice me

after data treatment steps, in logistic regresion i am getting 10 variables as significant with p > 0.001, where as intercept has value p value 0.3150.

what would be d reason for intercept being non significant.

Hosmer and Lemeshow Goodness-of-Fit Test is showing significance, so overall model is not a good fit.

please advice me

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samHT

11-12-2010 12:45 PM

Intercepts are not usually of interest for hypothesis testing. Is there any particular reason that you care whether the null hypothesis H0:Intercept=0 is rejected in favor of the alternative hypothesis HA:Intercept^=0?

Keep in mind, too, that when modeling a binary response using a logit link function (which is the default link function when you fit a logistic regression model), a zero value for the intercept would indicate that the probability of the response is 0.5 when all of the predictor variables are zero. So, a test of H0:Intercept=0 is testing whether the probability of the response is 0.5 (given zero values for all of the predictors).

Keep in mind, too, that when modeling a binary response using a logit link function (which is the default link function when you fit a logistic regression model), a zero value for the intercept would indicate that the probability of the response is 0.5 when all of the predictor variables are zero. So, a test of H0:Intercept=0 is testing whether the probability of the response is 0.5 (given zero values for all of the predictors).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samHT

11-15-2010 03:38 AM

thank you Dale,

estimate for intercept is 0.2970 and it is not significant, so i thought i will drop it, as i have not made any hypo for intercept.

i rerun the model without intercept using "noint" option in model statement in proc logistic., all 10 variables are significant, However Concordance is .992 which good but very high ;-( ie area under ROC is 99.2%

and Hosmer and Lemeshow Goodness-of-Fit Test has Chi sq value of 4982.9039 with p <.0001

how to get is insignificant

estimate for intercept is 0.2970 and it is not significant, so i thought i will drop it, as i have not made any hypo for intercept.

i rerun the model without intercept using "noint" option in model statement in proc logistic., all 10 variables are significant, However Concordance is .992 which good but very high ;-( ie area under ROC is 99.2%

and Hosmer and Lemeshow Goodness-of-Fit Test has Chi sq value of 4982.9039 with p <.0001

how to get is insignificant

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samHT

11-15-2010 09:40 AM

A couple of thoughts.

-- to follow-up on Dale's comment. Removing the intercept from the model says that you are "sure" that the true intercept is 0.5 . Unless there are theoretical reasons for doing that, I generally leave it in.

-- with an AUC of 99+%, I would worry about over-fitting the data. (A "good" AUC is somewhat dependent on the research discipline.)

-- with 5000 observations, most of the tests will be significant (including H&L). The p-value really doesn't tell you very much. You have to look at the results and determine if they are also meaningful.

-- to follow-up on Dale's comment. Removing the intercept from the model says that you are "sure" that the true intercept is 0.5 . Unless there are theoretical reasons for doing that, I generally leave it in.

-- with an AUC of 99+%, I would worry about over-fitting the data. (A "good" AUC is somewhat dependent on the research discipline.)

-- with 5000 observations, most of the tests will be significant (including H&L). The p-value really doesn't tell you very much. You have to look at the results and determine if they are also meaningful.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samHT

11-19-2010 09:36 AM

Nonsignificance is NOT a good reason to drop an intercept from a model. Leave it in unless you have a good physical (mechanistic) reason why it must be 0 (on a logit scale, or 0.5 on a probability scale).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to samHT

11-23-2010 06:54 AM

I'd go farther than LVM and say that significance/nonsignificance isn't really a good reason for including or dropping ANY variable, although it is one that is often used. I've had bosses/clients insist on it. Effect size is more important. With a large N, as has been said, it's easy to get significance. And, as also been said, the intercept is usually not of interest, but that is a reason for leaving it IN, not taking it OUT.

With 5,000 cases and 15 IVs, unless the DV is very unevenly distributed, you have plenty of cases per variable. Why not include ALL the IVs? How did you choose them in the first place? Are they substantively important?

With 5,000 cases and 15 IVs, unless the DV is very unevenly distributed, you have plenty of cases per variable. Why not include ALL the IVs? How did you choose them in the first place? Are they substantively important?