BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bmm0628
Calcite | Level 5

Hi! I am using the stepwise (and forward) selection criteria just to narrow down the variables that I use in the formal regression analysis that I use. However, a frequent problem that I have been running into is having variables in the final given model from the stepwise selection criteria with the unexpected sign, creating statistically insignificant results. With this, I was wondering if there is a way to code for this, so if the parameter estimate has a variable with the unexpected sign, it can be taken out of the model, and replaced with others.

 

I am using the "proc reg" procedure for this.

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

This is a known drawback of stepwise (and non-stepwise as well). It is caused by multi-collinearity among your X variables. Many techinques, such as PROC REG, PROC GLM, PROC LOGISTIC and so on can be very sensitive to multi-collinearity and can produce coefficients with the wrong sign.

 

An alternative is PROC PLS, which is surprisingly robust to multi-collinearity (and doesn't generally produce coefficients with the wrong sign). The lead programmer of PROC PLS wrote this paper, in which he fits a model with 1000 highly correlated x variables, does not bother with the step of variable selection, and still gets a useful model from PLS (Note: the syntax in the paper is very old and doesn't work in the current PROC PLS).

--
Paige Miller

View solution in original post

1 REPLY 1
PaigeMiller
Diamond | Level 26

This is a known drawback of stepwise (and non-stepwise as well). It is caused by multi-collinearity among your X variables. Many techinques, such as PROC REG, PROC GLM, PROC LOGISTIC and so on can be very sensitive to multi-collinearity and can produce coefficients with the wrong sign.

 

An alternative is PROC PLS, which is surprisingly robust to multi-collinearity (and doesn't generally produce coefficients with the wrong sign). The lead programmer of PROC PLS wrote this paper, in which he fits a model with 1000 highly correlated x variables, does not bother with the step of variable selection, and still gets a useful model from PLS (Note: the syntax in the paper is very old and doesn't work in the current PROC PLS).

--
Paige Miller

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 1168 views
  • 2 likes
  • 2 in conversation