Using Proc Reg and Proc GLMselect, I am encountering an issue where the variables that were selected are no longer significant when I re-run the model using only those variables. Example code would be as follows:
proc glmselect data=indata;
model y = / selection=stepwise choose=validate;
proc glm data=indata;
model y = ;
I have tried varying the "selection" (stepwise, lasso, etc. ) and "choose" options, listing the variables in the order in which they were selected, ss1, ss3, all to no avail. Any insights would be greatly appreciated. Thank you!
Stepwise selection procedures do not necessarily select just "significant" terms in a classical sense. In fact, tests of significance within the stepwise procedures are quite controversial, and p values may not have simple meanings (the extreme multiple testing causes all kinds of problems). Many stepwise selection methods are liberal, so that possible models are not discarded too readily. Thus, all the terms selected may not be significant when using them in a single model (separate procedure) (as if there was no stepwise selection). You can change the selection criteria in various procedures to make it more difficult to include terms.