10-25-2013 07:31 AM
Greetings all. I'm looking for a selection option in proc reg that will only include models with all significant regressors. My data has 21 variables, and when I use the / selection=adjrsq option, the results indeed show the models with the highest adjrsq, but some of them have regressors in them that are not significant at the .05 level, eventhough that model might have the highest adjrsq. I found this ( http://www.math.wpi.edu/saspdf/stat/chap55.pdf ) to be very helpful, but I can't seem to find anything that will limit the models to only those with all significant regressors. Is this even possible? Thank you.
10-25-2013 09:14 AM
I suppose it is possible, but is it desirable? Because of collinearity, masking, etc., the best fit will not necessarily involve only significant regressors. I would strongly suggest looking at PROC GLMSELECT and using SELECTION=LAR or =LASSO if you are looking to create a predictive model. If the objective is an explanatory model for existing data only, do a google search on "variable selection Flom Cassell" to get to a paper outlining what goes wrong with model selection. In addition, try to obtain Frank Harrell's Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression and Survival Analysis. It is (or should be) required reading for anyone attempting to use regression techniques.
12-14-2013 11:53 PM
yes there is a chance that SAS model can retain few variables which are not significant at 0.05 level, one reason can be interactions. Yes if the interaction effect with other variables is significant then both the variables those are part of interaction would not be eliminated from the best model fit even though they are non significant at 0.05 level.
12-16-2013 04:59 PM
If you choose proc reg you can limit the models to only the significant independent variables by using forward selection, backward elimination or stepwise selection. Please note that you have to specify the level of significance that you choose otherwise it will keep the default option (see below). There are other better modelling options as Steve suggested.
Please see the SAS webpage for details: http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect030....
12-17-2013 01:38 PM
Even more importantly, see the following paper by Flom and Cassell: Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use at http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf
I know I referred to this earlier. It is especially important if you with to use the model for scoring new cases or are trying to make some sort of generalizing statements regarding the effects of the independent variables on the dependent variables