Hello,
How can I get the AIC's of all models of stepwise regression? If I use this code, I only get the result for the final model:
Data Input (Drop=i j);
Array X{*} X1-X500;
Do j=1 To 140;
X1=Rannor(1);
X2=Rannor(1);
Y=2+X1*3-X2*4+Rannor(1)-0.5;
Do i=3 To 500;
X{i}=Rannor(1);
End;
Output;
End;
Run;
Proc Reg Data=Input OutEst=Result;
Model Y = X1-X500 / Selection=Forward AIC BIC;
Run;
Thanks & kind regards
Here's sample code for PROC GLMSELECT:
proc glmselect data=input;
model y = x1-x5 / selection=forward(select=sl) stats=bic details=all;
run;
The sub-option SELECT=SL specifies that variable selection is based on the significance level of the F statistic (similar to PROC REG, the default would be different: SBC). Option STATS=BIC includes the BIC in the output. AIC is included by default. DETAILS=ALL requests fit statistics and many other details about the models at each step of the variable selection process.
I've reduced the number of independent variables to 5 just for demonstration. Having more explanatory effects (500) than observations (140) in the analysis dataset would not be sensible for the general linear model anyway.
You can try use PROC HPGENSELECT.
proc hpgenselect data=input;
Model Y = X1-X50/dist=normal link=id;
Selection method=forward(choose=aic);
Run;
It shows the AIC value from each model. It also choose the model based on the AIC. Unfortunately I couldn't get it to work with 500 variables (error message due to resource problems) so I only included the first 50 variables.
Effect Number p
Step Entered Effects In AIC Value
0 Intercept 1 877.2496 .
------------------------------------------------------------
1 x2 2 760.4677 <.0001
2 x1 3 413.6365 <.0001
3 x39 4 406.7601 0.0034
4 x47 5 400.3529 0.0043
5 x17 6 395.3228* 0.0088
ods output SelParmEst=SelParmEst;
You shouldn't be using stepwise to build models - the results are wrong see e.g. Stopping Stepwise
However, if you still want this, you can use GLMSELECT and use the DETAILS = FITSTATISTICS on the MODEL statement.
Here's sample code for PROC GLMSELECT:
proc glmselect data=input;
model y = x1-x5 / selection=forward(select=sl) stats=bic details=all;
run;
The sub-option SELECT=SL specifies that variable selection is based on the significance level of the F statistic (similar to PROC REG, the default would be different: SBC). Option STATS=BIC includes the BIC in the output. AIC is included by default. DETAILS=ALL requests fit statistics and many other details about the models at each step of the variable selection process.
I've reduced the number of independent variables to 5 just for demonstration. Having more explanatory effects (500) than observations (140) in the analysis dataset would not be sensible for the general linear model anyway.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.