Programming the statistical procedures from SAS

Logistic regression

Occasional Contributor
Posts: 8

Logistic regression

i am running a logistic regression for agent data. i have 15 variables and 5000 obs.

after data treatment steps, in logistic regresion i am getting 10 variables as significant with p > 0.001, where as intercept has value p value 0.3150.

what would be d reason for intercept being non significant.

Hosmer and Lemeshow Goodness-of-Fit Test is showing significance, so overall model is not a good fit.

please advice me
Regular Contributor
Posts: 171

Re: Logistic regression

Intercepts are not usually of interest for hypothesis testing. Is there any particular reason that you care whether the null hypothesis H0:Intercept=0 is rejected in favor of the alternative hypothesis HA:Intercept^=0?

Keep in mind, too, that when modeling a binary response using a logit link function (which is the default link function when you fit a logistic regression model), a zero value for the intercept would indicate that the probability of the response is 0.5 when all of the predictor variables are zero. So, a test of H0:Intercept=0 is testing whether the probability of the response is 0.5 (given zero values for all of the predictors).
Occasional Contributor
Posts: 8

Re: Logistic regression

thank you Dale,

estimate for intercept is 0.2970 and it is not significant, so i thought i will drop it, as i have not made any hypo for intercept.
i rerun the model without intercept using "noint" option in model statement in proc logistic., all 10 variables are significant, However Concordance is .992 which good but very high ;-( ie area under ROC is 99.2%

and Hosmer and Lemeshow Goodness-of-Fit Test has Chi sq value of 4982.9039 with p <.0001

how to get is insignificant
Trusted Advisor
Posts: 2,116

Re: Logistic regression

A couple of thoughts.

-- to follow-up on Dale's comment. Removing the intercept from the model says that you are "sure" that the true intercept is 0.5 . Unless there are theoretical reasons for doing that, I generally leave it in.

-- with an AUC of 99+%, I would worry about over-fitting the data. (A "good" AUC is somewhat dependent on the research discipline.)

-- with 5000 observations, most of the tests will be significant (including H&L). The p-value really doesn't tell you very much. You have to look at the results and determine if they are also meaningful.
Valued Guide
Valued Guide
Posts: 684

Re: Logistic regression

Nonsignificance is NOT a good reason to drop an intercept from a model. Leave it in unless you have a good physical (mechanistic) reason why it must be 0 (on a logit scale, or 0.5 on a probability scale).
Frequent Contributor
Posts: 140

Re: Logistic regression

I'd go farther than LVM and say that significance/nonsignificance isn't really a good reason for including or dropping ANY variable, although it is one that is often used. I've had bosses/clients insist on it. Effect size is more important. With a large N, as has been said, it's easy to get significance. And, as also been said, the intercept is usually not of interest, but that is a reason for leaving it IN, not taking it OUT.

With 5,000 cases and 15 IVs, unless the DV is very unevenly distributed, you have plenty of cases per variable. Why not include ALL the IVs? How did you choose them in the first place? Are they substantively important?
Ask a Question
Discussion stats
  • 5 replies
  • 5 in conversation