BookmarkSubscribeRSS Feed
gsnidow
Obsidian | Level 7

Greetings all.  I'm looking for a selection option in proc reg that will only include models with all significant regressors.  My data has 21 variables, and when I use the / selection=adjrsq option, the results indeed show the models with the highest adjrsq, but some of them have regressors in them that are not significant at the .05 level, eventhough that model might have the highest adjrsq.  I found this ( http://www.math.wpi.edu/saspdf/stat/chap55.pdf ) to be very helpful, but I can't seem to find anything that will limit the models to only those with all significant regressors.  Is this even possible?  Thank you.

Greg

4 REPLIES 4
SteveDenham
Jade | Level 19

I suppose it is possible, but is it desirable?  Because of collinearity, masking, etc., the best fit will not necessarily involve only significant regressors.  I would strongly suggest looking at PROC GLMSELECT and using SELECTION=LAR or =LASSO if you are looking to create a predictive model.  If the objective is an explanatory model for existing data only, do a google search on "variable selection Flom Cassell" to get to a paper outlining what goes wrong with model selection.  In addition, try to obtain Frank Harrell's Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression and Survival Analysis.  It is (or should be) required reading for anyone attempting to use regression techniques.

Steve Denham

NareshAbburi
Calcite | Level 5

Hi,

yes there is a chance that SAS model can retain few variables which are not significant at 0.05 level, one reason can be interactions. Yes if the interaction effect with other variables is significant then both the variables those are part of interaction would not be eliminated from the best model fit even though they are non significant at 0.05 level.

Thanks,

Naresh Abburi

pronabesh
Fluorite | Level 6

If you choose proc reg you can limit the models to only the significant independent variables by using forward selection, backward elimination or stepwise selection. Please note that you have to specify the level of significance that you choose otherwise it will keep the default option (see below). There are other better modelling options as Steve suggested.


Please see the SAS webpage for details: http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect030....

SteveDenham
Jade | Level 19

Even more importantly, see the following paper by Flom and Cassell: Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use at http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf

I know I referred to this earlier.  It is especially important if you with to use the model for scoring new cases or are trying to make some sort of generalizing statements regarding the effects of the independent variables on the dependent variables

Steve Denham


sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1697 views
  • 0 likes
  • 4 in conversation