Hi,
I am trying to understand how exactly stepwise selections words in case of PROCREG.
According to the Documentation explanation of Stepwise selection. (My Questions are in bold and Italic)
"The stepwise method is a modification of the forward-selection technique and differs in that variables already in the model do not necessarily stay there. As in the forward-selection method, variables are added one by one to the model, and the F statistic for a variable to be added must be significant(greater than SLENTRY, or less than?) at the SLENTRY= level(Was the F statistics calculated independently or was it done after keeping it with dependent variable).
After a variable is added, however, the stepwise method looks at all the variables already included in the model and deletes any variable that does not produce an F statistic significant at the SLSTAY= level.
Only after this check is made and the necessary deletions are accomplished can another variable be added to the model.
The stepwise process ends when none of the variables outside the model(What is meant by outside the model?) has an F statistic significant(less or more?) at the SLENTRY= level and every variable in the model is significant at the SLSTAY= level, or when the variable to be added to the model is the one just deleted from it.
Please explain this concept as you would to a 10 year old. I am just trying to understand what exactly SLENTRY, and SLSTAY do here.
and How many times the F Statistics have been calculated.(How do they go from consideration to acceptance/rejection)?
Link to the documentation: https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect030...
IMHO, you are asking the wrong question. Stepwise regression was added to PROC REG decades ago. The more modern offering in SAS, which has traditional stepwise methods and other newer and better methods is PROC GLMSELECT. That is the better procedure to use.
In addition to the advice by @WarrenKuhfeld, you could avoid the issue of selecting input variables entirely by using Partial Least Squares Regression (PROC PLS). At least one study shows that the mean squared error of the predictions is better using PLS than using OLS and better than using stepwise methods.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.