BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lcmichael_unc
Fluorite | Level 6

I am attempting to use the stepwise selection method to formulate a parsimonious model from 30 covariates, a dichotomous outcome, and 177 observations. SLENTRY=SLSTAY=0.1 and the initial, univariate Chi-square scores show 10 variables meeting the entry criterion. However, two predictors with the largest Chi-square scores each terminate the stepwise process because they both fail (P>0.6) the predictor retention criterion, once entered and the output states "Model building terminates because the last effect entered is removed by the Wald statistic criterion". If I exclude these two predictors from the stepwise selection, the model proceeds as expected until no additional predictors meet the entry criterion. I have two questions: 1) Why does a predictor with a very large Chi-square score, and p=0.0007, fail to be retained in the stepwise model? and 2) Is it statistically-defensible to exclude predictors from the stepwise process with large Chi-square scores and proceed as I have described above? All advice and citations accepted with gratitude.

1 ACCEPTED SOLUTION

Accepted Solutions
lcmichael_unc
Fluorite | Level 6
Opinion gratefully noted.

View solution in original post

11 REPLIES 11
Reeza
Super User
To run models that are reliable you usually need 25 obs per covariate. You would need 25*30 = 750 observations to run this model at minimum, assuming none of your covariates are categorical. You don't have enough data to run what you want. I would consider doing a PLS regression instead.
PaigeMiller
Diamond | Level 26

Stepwise regression is what I call a counter-intuitive method. It adds variables into the model because they meet some significance criterion, and then it can remove that same variable in the next step (or later step) because it no longer meets the significance criterion. How can that be? How does that make sense? Why would you want to use such a procedure? How would you explain it to someone?

 

If you want to hear what people say about it, go to your favorite internet search engine and type in "problems with stepwise regression" and read what people say.

 

What is happening is that when you have correlated predictor variables (as your 30 variables are), the presence of (for example) X7 in the model affects and changes the co-efficients of X1-X6 , and so when the coefficients change, the p-values change and a variable that was significant without X7 in the model can become not significant when X7 is in the model.

 

So, what should a conscientious data analyst do? My OPINION is that you should not use any form of Stepwise regression (not stepwise, not forward, not backward). Instead, I use Partial Least Squares regression (PROC PLS in SAS) when I have many correlated X variables, and in PLS, a variable that is a good predictor remains a good predictor even when other variables are entered into (or removed from) the model. But wait — PROC PLS only works on continuous Y variables, it doesn't handle the logistic case. There is nothing in SAS that will perform Logistic PLS. There is a paper which explains the Logistic PLS algorithm, and I have written a SAS macro that performs Logistic PLS based upon this paper. I like the way it works in these situations, but I don't think my employer would want me to share the macro.

 

So what should you do? Well, I don't know. There is R code that performs Logistic PLS, if that's something that would help.

--
Paige Miller
PaigeMiller
Diamond | Level 26

I did suggest that SAS produce a PROC that performs Logistic PLS, but no one has voted for it 😞

https://communities.sas.com/t5/SASware-Ballot-Ideas/Logistic-version-of-PROC-PLS/idi-p/485503

--
Paige Miller
lcmichael_unc
Fluorite | Level 6

That’s a splendid response. Thank you. Now I have to convince a client.

lcmichael_unc
Fluorite | Level 6

If I may pursue this just one more step (poor word choice), only the intercept is in the model when the first predictor is entered, which is immediately removed and the model development terminates. 

PaigeMiller
Diamond | Level 26

@lcmichael_unc wrote:

If I may pursue this just one more step (poor word choice), only the intercept is in the model when the first predictor is entered, which is immediately removed and the model development terminates. 


 

Can I explain everything that STEPWISE does? No, I can't.

--
Paige Miller
lcmichael_unc
Fluorite | Level 6

Humor is an excellent explanation. Thanks.

ballardw
Super User

@lcmichael_unc wrote:

If I may pursue this just one more step (poor word choice), only the intercept is in the model when the first predictor is entered, which is immediately removed and the model development terminates. 


What does the log say?

 

I would not be surprised to have something that relates to @Reeza's comment about sample size.

lcmichael_unc
Fluorite | Level 6

The log is silent...

 

NOTE: PROC LOGISTIC is modeling the probability that SVR12=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 0.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 2.
NOTE: LACKFIT is ignored since there is no explanatory variable in the model.
NOTE: The data set WORK.RSQUARE has 1 observations and 7 variables.
NOTE: The data set WORK.PARAMEST has 6 observations and 9 variables.
NOTE: The data set WORK.MODELINFO has 5 observations and 3 variables.
NOTE: The data set WORK.GOF has 2 observations and 5 variables.
NOTE: The data set WORK.ODDSRAT has 2 observations and 5 variables.
NOTE: The data set WORK.NOBS has 2 observations and 6 variables.
NOTE: There were 174 observations read from the data set FR190301.MITT_GT_VF.
NOTE: PROCEDURE LOGISTIC used (Total process time):
      real time           0.18 seconds
      cpu time            0.14 seconds
 
...and I wish it were not so.
PaigeMiller
Diamond | Level 26

@ballardw wrote:

@lcmichael_unc wrote:

If I may pursue this just one more step (poor word choice), only the intercept is in the model when the first predictor is entered, which is immediately removed and the model development terminates. 


What does the log say?

 

I would not be surprised to have something that relates to @Reeza's comment about sample size.


In my opinion, this is a deficiency of the method of stepwise regression, and has nothing to do with sample size.

--
Paige Miller
lcmichael_unc
Fluorite | Level 6
Opinion gratefully noted.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 3815 views
  • 2 likes
  • 4 in conversation