BookmarkSubscribeRSS Feed
samHT
Calcite | Level 5
i am running a logistic regression for agent data. i have 15 variables and 5000 obs.

after data treatment steps, in logistic regresion i am getting 10 variables as significant with p > 0.001, where as intercept has value p value 0.3150.

what would be d reason for intercept being non significant.

Hosmer and Lemeshow Goodness-of-Fit Test is showing significance, so overall model is not a good fit.

please advice me
5 REPLIES 5
Dale
Pyrite | Level 9
Intercepts are not usually of interest for hypothesis testing. Is there any particular reason that you care whether the null hypothesis H0:Intercept=0 is rejected in favor of the alternative hypothesis HA:Intercept^=0?

Keep in mind, too, that when modeling a binary response using a logit link function (which is the default link function when you fit a logistic regression model), a zero value for the intercept would indicate that the probability of the response is 0.5 when all of the predictor variables are zero. So, a test of H0:Intercept=0 is testing whether the probability of the response is 0.5 (given zero values for all of the predictors).
samHT
Calcite | Level 5
thank you Dale,

estimate for intercept is 0.2970 and it is not significant, so i thought i will drop it, as i have not made any hypo for intercept.
i rerun the model without intercept using "noint" option in model statement in proc logistic., all 10 variables are significant, However Concordance is .992 which good but very high ;-( ie area under ROC is 99.2%

and Hosmer and Lemeshow Goodness-of-Fit Test has Chi sq value of 4982.9039 with p <.0001

how to get is insignificant
Doc_Duke
Rhodochrosite | Level 12
A couple of thoughts.

-- to follow-up on Dale's comment. Removing the intercept from the model says that you are "sure" that the true intercept is 0.5 . Unless there are theoretical reasons for doing that, I generally leave it in.

-- with an AUC of 99+%, I would worry about over-fitting the data. (A "good" AUC is somewhat dependent on the research discipline.)

-- with 5000 observations, most of the tests will be significant (including H&L). The p-value really doesn't tell you very much. You have to look at the results and determine if they are also meaningful.
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
Nonsignificance is NOT a good reason to drop an intercept from a model. Leave it in unless you have a good physical (mechanistic) reason why it must be 0 (on a logit scale, or 0.5 on a probability scale).
plf515
Lapis Lazuli | Level 10
I'd go farther than LVM and say that significance/nonsignificance isn't really a good reason for including or dropping ANY variable, although it is one that is often used. I've had bosses/clients insist on it. Effect size is more important. With a large N, as has been said, it's easy to get significance. And, as also been said, the intercept is usually not of interest, but that is a reason for leaving it IN, not taking it OUT.

With 5,000 cases and 15 IVs, unless the DV is very unevenly distributed, you have plenty of cases per variable. Why not include ALL the IVs? How did you choose them in the first place? Are they substantively important?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1533 views
  • 0 likes
  • 5 in conversation