BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I tried to build a logistic model using the output of AIC to assess the fit of the models in the model building process. The underlying data set was the exactly the same in each step. The AIC was shown for intercept only model and the intercept with covariates model as standard output from SAS proc logistic. I supposed that the AIC for intercept only model would be exactly the same for each step and the AIC for intercept with covariates model would be different depending on the covariates included in the model (the lower the AIC, the better the fit of the model). However, what I found out was that the AIC for intercept only model changed in each step even though the underlying data set is the same. I don't under stand why this happened. Also, doesn't that mean that the AIC for intercept with covariates model is not trustable. If so, how do I get the true AIC statistic? Thanks
3 REPLIES 3
sfleming
Calcite | Level 5
Did you check that the number of observations used was the same at each step? This information is output by default. You won't be able to compare models by AIC unless the exact same observations are used when fitting each model. For example, missing variables on some of the covariates would cause PROC LOGISTIC to use a different set of observations.

I particularly don't understand your statement :

"However, what I found out was that the AIC for intercept only model changed in each step even though the underlying data set is the same. "

What does it mean for the intercept only model to change at each step? Did the model syntax change?

It would be helpful it you could post simplified SAS code, log, and output that demonstrate the issue.
deleted_user
Not applicable
Thanks for the response. I did pay attention to the number of valid obeservations in the models. However, I didn't notice any null values or anything missing between each models, i.e. the modeling sample is exactly the same. What was changing was the number of covariates and their correponding functional forms (dummies vs. spline line) for which I believe would not change the AIC for intercept only models. The following is the statistics for the first 4 models

Model Criterion Intercept Only Intercept And Covariates
1 AIC 119146.84 95157.03
1 SC 119157.56 95639.37
1 -2 Log L 119144.84 95067.03
2 AIC 449674.77 361109.87
2 SC 449685.49 361324.25
2 -2 Log L 449672.77 361069.87
3 AIC 457288.30 359520.98
3 SC 457299.02 359788.95
3 -2 Log L 457286.30 359470.98
4 AIC 462250.20 356308.08
4 SC 462260.92 356640.36
4 -2 Log L 462248.20 356246.08

as you may see that the AIC for intercept only model changed significantly from model 1 to model 2. The only difference between model 1 and model 2 wsa that the functional forms of the covariates changed from dummy variables to spline lines. Such functional form changes could result in changes in AIC for the intercept and covariates model due to the fit of the model and the degress of freedom reduction but it should not change the AIC for intercept only model. Moreover, if I used the stepwise selection option in proc logistic, the AIC for intercept only model remained the same which makes perfect sense. I am sure I missed something here. Thanks in advance for the help.
sfleming
Calcite | Level 5
My background is in the use of PROC MIXED, and in that context, I have read warnings against using Information Criteria to compare models unless the exact same observations are included in each.

My best guess is that there are missing values in the covariates and that SAS drops all observations with a missing value on a covariate even to fit the Intercept Only model. To me, this makes sense because the 2 models cannot be compared unless they are fit on the exact same observations. When you fit a different model, different observations are dropped because the included covariates have a different pattern of missing data.

During a quick check, I could not find this issue addressed in the SAS Help or on the Support site so take this opinion with a grain of salt. Hopefully someone more knowledgeable will chime in.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 3098 views
  • 0 likes
  • 2 in conversation