- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all!
I need to fit a logistic regression model and am wondering which model-seletion method would be best. I have been advised to stay away from forward/backward/stepwise regression. All-possible-regression seems attractive, but I must admit I'm a little lost on AIC/BIC/Cp/etc and exactly how I would go about picking the best model...
I have a binary response variable, a categorical predictor, 10 categorical covariates, and 2 continuous covariates.
Thank you in advance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Search Model Selection Method on here...this topic comes up frequently, and there is no 'CORRECT' answer, but some answers are more valid than others 😉
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately I've been all over the boards and haven't found anything useful. I've also read several papers - I just can't seem to locate the syntax for an all-possible. In addition, I was hoping someone could break it down for me in less technical language so I could really understand AIC/Cp/etc...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @chelsealutz
After finding the potential factor/variable for inclusion in the model using any of:
- selection = stepwise slentry = 0.15 slstay = 0.15;
- selection = forward slentry =0.15
- selection = backward slstay = 0.15
- selection = score ,
for both quantitative and categorical variables and interaction term - you can compare models based on following criteria:
- -2LogL
- The value itself is not important. It is used to compare two nested models, model with smaller -2LogL is better. Difference in -2LogL between two nested models is approximately distributed as Chi-square.
- AIC (Akaike Information Criterion)
- AIC is used to compare non-nested models on the same sample. AIC value itself is not meaningful but the model with the smallest AIC is considered the best.
- SC (Schwarz Criterion)
- Model with smallest SC is most desirable but the value itself is not meaningful. Like AIC, it is appropriate for non-nested models.
- ROC Area
- The area under the ROC curve is a measure of the model’s ability to discriminate between event and non-event:
- Large values are desirable (predictive accuracy for (event, non-event) pairs).
- ROC = 0.5: no discrimination (no better than coin toss)
- 0.7 <= ROC < 0.8: acceptable discrimination
- 0.8 <= ROC < 0.9: excellent discrimination
- ROC > 0.9: outstanding discrimination
- Brier’s Score
- Small values are desirable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Btw, there is also the LASSO method, which can be as good as stepwise selection.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You gotta know forward/backward/stepwise regression all these are doing unconditional logistic regression.
After getting the most influent variables , to get Best Fit , you'd better try Exact logistic regression or Conditional logistic regression or Penalty logistic regression(add FIRTH option into ( MODEL statement ) .