Hi! I am using the stepwise (and forward) selection criteria just to narrow down the variables that I use in the formal regression analysis that I use. However, a frequent problem that I have been running into is having variables in the final given model from the stepwise selection criteria with the unexpected sign, creating statistically insignificant results. With this, I was wondering if there is a way to code for this, so if the parameter estimate has a variable with the unexpected sign, it can be taken out of the model, and replaced with others.
I am using the "proc reg" procedure for this.
Thank you!
This is a known drawback of stepwise (and non-stepwise as well). It is caused by multi-collinearity among your X variables. Many techinques, such as PROC REG, PROC GLM, PROC LOGISTIC and so on can be very sensitive to multi-collinearity and can produce coefficients with the wrong sign.
An alternative is PROC PLS, which is surprisingly robust to multi-collinearity (and doesn't generally produce coefficients with the wrong sign). The lead programmer of PROC PLS wrote this paper, in which he fits a model with 1000 highly correlated x variables, does not bother with the step of variable selection, and still gets a useful model from PLS (Note: the syntax in the paper is very old and doesn't work in the current PROC PLS).
This is a known drawback of stepwise (and non-stepwise as well). It is caused by multi-collinearity among your X variables. Many techinques, such as PROC REG, PROC GLM, PROC LOGISTIC and so on can be very sensitive to multi-collinearity and can produce coefficients with the wrong sign.
An alternative is PROC PLS, which is surprisingly robust to multi-collinearity (and doesn't generally produce coefficients with the wrong sign). The lead programmer of PROC PLS wrote this paper, in which he fits a model with 1000 highly correlated x variables, does not bother with the step of variable selection, and still gets a useful model from PLS (Note: the syntax in the paper is very old and doesn't work in the current PROC PLS).
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.