I tried to build a logistic model using the output of AIC to assess the fit of the models in the model building process. The underlying data set was the exactly the same in each step. The AIC was shown for intercept only model and the intercept with covariates model as standard output from SAS proc logistic. I supposed that the AIC for intercept only model would be exactly the same for each step and the AIC for intercept with covariates model would be different depending on the covariates included in the model (the lower the AIC, the better the fit of the model). However, what I found out was that the AIC for intercept only model changed in each step even though the underlying data set is the same. I don't under stand why this happened. Also, doesn't that mean that the AIC for intercept with covariates model is not trustable. If so, how do I get the true AIC statistic? Thanks
Did you check that the number of observations used was the same at each step? This information is output by default. You won't be able to compare models by AIC unless the exact same observations are used when fitting each model. For example, missing variables on some of the covariates would cause PROC LOGISTIC to use a different set of observations.
I particularly don't understand your statement :
"However, what I found out was that the AIC for intercept only model changed in each step even though the underlying data set is the same. "
What does it mean for the intercept only model to change at each step? Did the model syntax change?
It would be helpful it you could post simplified SAS code, log, and output that demonstrate the issue.
Thanks for the response. I did pay attention to the number of valid obeservations in the models. However, I didn't notice any null values or anything missing between each models, i.e. the modeling sample is exactly the same. What was changing was the number of covariates and their correponding functional forms (dummies vs. spline line) for which I believe would not change the AIC for intercept only models. The following is the statistics for the first 4 models
Model Criterion Intercept Only Intercept And Covariates
1 AIC 119146.84 95157.03
1 SC 119157.56 95639.37
1 -2 Log L 119144.84 95067.03
2 AIC 449674.77 361109.87
2 SC 449685.49 361324.25
2 -2 Log L 449672.77 361069.87
3 AIC 457288.30 359520.98
3 SC 457299.02 359788.95
3 -2 Log L 457286.30 359470.98
4 AIC 462250.20 356308.08
4 SC 462260.92 356640.36
4 -2 Log L 462248.20 356246.08
as you may see that the AIC for intercept only model changed significantly from model 1 to model 2. The only difference between model 1 and model 2 wsa that the functional forms of the covariates changed from dummy variables to spline lines. Such functional form changes could result in changes in AIC for the intercept and covariates model due to the fit of the model and the degress of freedom reduction but it should not change the AIC for intercept only model. Moreover, if I used the stepwise selection option in proc logistic, the AIC for intercept only model remained the same which makes perfect sense. I am sure I missed something here. Thanks in advance for the help.
My background is in the use of PROC MIXED, and in that context, I have read warnings against using Information Criteria to compare models unless the exact same observations are included in each.
My best guess is that there are missing values in the covariates and that SAS drops all observations with a missing value on a covariate even to fit the Intercept Only model. To me, this makes sense because the 2 models cannot be compared unless they are fit on the exact same observations. When you fit a different model, different observations are dropped because the included covariates have a different pattern of missing data.
During a quick check, I could not find this issue addressed in the SAS Help or on the Support site so take this opinion with a grain of salt. Hopefully someone more knowledgeable will chime in.