turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- why selection=stepwise is not one very good way to...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

02-12-2012 08:31 AM

what problem with this method?Who can tell me in details? Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeTurner

02-12-2012 09:26 AM

Take a look at: http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf

And, for more similar reading, just look up stepwise and cassell in any web search.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeTurner

02-16-2012 07:34 PM

In stepwise regression the decisions about which variables should be included will be based upon slight differences in their semi-partial correlation. This in turn leads to the danger of over or under fitting, which may contrast with theoretical importance of a predictor.

Sounds intuitively appealling to have some procedure that automatically chooses the predictors(regressors) for you, however samples are not perfect. That is the statistical procedures assume that our sample data are perfect (no measurement error, omitted variable and stuff), hence the statistical significance obtained from the procedure assuming this perfect data will be biased(wrong). We must reason and use our brains to choose which regressors should be looked at (ideally we would want some theory to base our decision).

Also step-wise procedures to choose which regressors were to be included depends on what regressors we have in our dataset. So if we do not have any theory on which regressors should be looked at and just used stepwise procedure to select regressors then people will conclude differently depending on the regressors they have in their data.

I tend to think about stepwise procedure to be the best procedure to choose the regressors IF we have datasets with all possible variables in the world (billions and billions of them) and have the computing power that can go through these variables at multiples of lighting speed. Which is not possible now or in our life time.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to VX_Xc

02-17-2012 08:16 AM

Regarding the last statement: Stepwise regression would still yield biased results. It is a matter of sampling from the population. You cannot get around it.

Now what you could do, given the abilities specified, is measure all possible variables on every individual in the population, and fit that by regression. And watch collinearity kill the interpretation.

In my opinion, and I stress that this is only an opinion, regression is just not quite the right tool for data exploration. It is a great tool for finding the degree of relationship for pre-specified variables.

In these days of big data, and in the days to come of even bigger data, I wonder if the whole branch of statistics that falls under "linear models" like regression, ANOVA, GLMMs, etc. will be considered the equivalent of steam power.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MikeTurner

02-21-2012 05:51 PM

Thanks for referencing my paper, Art.

VS, as Steve pointed out, Stepwise is a bad method even if you have all the data and computing time in the world. The p values are too low, the standard errors are too small, the parameters are biased away from 0... it's not good.

If you insist on an automatic method, Lasso or LAR is better; they are available in PROC GLMSELECT

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to plf515

02-21-2012 06:04 PM

Hi Peter! Nice to see you here! There have been a number of interesting questions raised on the Discussion Forums over the past year that could have benefitted from your expertise. I'm sure there will be many more to come.