BookmarkSubscribeRSS Feed
chelsealutz
Fluorite | Level 6

Hello all!

 

I need to fit a logistic regression model and am wondering which model-seletion method would be best. I have been advised to stay away from forward/backward/stepwise regression. All-possible-regression seems attractive, but I must admit I'm a little lost on AIC/BIC/Cp/etc and exactly how I would go about picking the best model...

 

I have a binary response variable, a categorical predictor, 10 categorical covariates, and 2 continuous covariates.

 

Thank you in advance!

5 REPLIES 5
Reeza
Super User

Search Model Selection Method on here...this topic comes up frequently, and there is no 'CORRECT' answer, but some answers are more valid than others 😉

chelsealutz
Fluorite | Level 6

Unfortunately I've been all over the boards and haven't found anything useful. I've also read several papers - I just can't seem to locate the syntax for an all-possible. In addition, I was hoping someone could break it down for me in less technical language so I could really understand AIC/Cp/etc...

samnan
Quartz | Level 8

Hi @chelsealutz

 

 

After finding the potential factor/variable  for inclusion in the model using any of:

- selection = stepwise slentry = 0.15 slstay = 0.15;

- selection = forward  slentry =0.15

- selection = backward slstay = 0.15

- selection = score ,

for both quantitative and categorical variables and interaction term - you can compare models based on following criteria: 

 

  • -2LogL
  • The value itself is not important. It is used to compare two nested models, model with smaller -2LogL is better. Difference in -2LogL between two nested models is approximately distributed as Chi-square.
  • AIC (Akaike Information Criterion)
  • AIC is used to compare non-nested models on the same sample. AIC value itself is not meaningful but the model with the smallest AIC is considered the best.
  • SC (Schwarz Criterion)
  • Model with smallest SC is most desirable but the value itself is not meaningful. Like AIC, it is appropriate for non-nested models.
  • ROC Area
    • The area under the ROC curve is a measure of the model’s ability to discriminate between event and non-event:
  • Large values are desirable (predictive accuracy for (event, non-event) pairs).
    • ROC = 0.5: no discrimination (no better than coin toss)
    • 0.7 <= ROC < 0.8: acceptable discrimination
    • 0.8 <= ROC < 0.9: excellent discrimination
    • ROC > 0.9: outstanding discrimination
  • Brier’s Score
  • Small values are desirable. 

 

 

 

JacobSimonsen
Barite | Level 11
I think stepwise selection has better chance to give the model with the best fit (compared to forward / backward) . This is because this method can both go forward and backward until the model can not end up with a better fit. Backward selection goes only backward and forward go only forward.
Btw, there is also the LASSO method, which can be as good as stepwise selection.
Ksharp
Super User

You gotta know  forward/backward/stepwise regression all these are doing unconditional logistic regression.

After getting the most influent variables , to get Best Fit , you'd better try Exact logistic regression or Conditional  logistic regression or Penalty  logistic regression(add FIRTH option into ( MODEL statement ) .

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1851 views
  • 0 likes
  • 5 in conversation