Fluorite | Level 6

## Stepwise logistic regression

I am attempting to use the stepwise selection method to formulate a parsimonious model from 30 covariates, a dichotomous outcome, and 177 observations. SLENTRY=SLSTAY=0.1 and the initial, univariate Chi-square scores show 10 variables meeting the entry criterion. However, two predictors with the largest Chi-square scores each terminate the stepwise process because they both fail (P>0.6) the predictor retention criterion, once entered and the output states "Model building terminates because the last effect entered is removed by the Wald statistic criterion". If I exclude these two predictors from the stepwise selection, the model proceeds as expected until no additional predictors meet the entry criterion. I have two questions: 1) Why does a predictor with a very large Chi-square score, and p=0.0007, fail to be retained in the stepwise model? and 2) Is it statistically-defensible to exclude predictors from the stepwise process with large Chi-square scores and proceed as I have described above? All advice and citations accepted with gratitude.

1 ACCEPTED SOLUTION

Accepted Solutions
Fluorite | Level 6

## Re: Stepwise logistic regression

Opinion gratefully noted.
11 REPLIES 11
Super User

## Re: Stepwise logistic regression

To run models that are reliable you usually need 25 obs per covariate. You would need 25*30 = 750 observations to run this model at minimum, assuming none of your covariates are categorical. You don't have enough data to run what you want. I would consider doing a PLS regression instead.
Diamond | Level 26

## Re: Stepwise logistic regression

Stepwise regression is what I call a counter-intuitive method. It adds variables into the model because they meet some significance criterion, and then it can remove that same variable in the next step (or later step) because it no longer meets the significance criterion. How can that be? How does that make sense? Why would you want to use such a procedure? How would you explain it to someone?

If you want to hear what people say about it, go to your favorite internet search engine and type in "problems with stepwise regression" and read what people say.

What is happening is that when you have correlated predictor variables (as your 30 variables are), the presence of (for example) X7 in the model affects and changes the co-efficients of X1-X6 , and so when the coefficients change, the p-values change and a variable that was significant without X7 in the model can become not significant when X7 is in the model.

So, what should a conscientious data analyst do? My OPINION is that you should not use any form of Stepwise regression (not stepwise, not forward, not backward). Instead, I use Partial Least Squares regression (PROC PLS in SAS) when I have many correlated X variables, and in PLS, a variable that is a good predictor remains a good predictor even when other variables are entered into (or removed from) the model. But wait — PROC PLS only works on continuous Y variables, it doesn't handle the logistic case. There is nothing in SAS that will perform Logistic PLS. There is a paper which explains the Logistic PLS algorithm, and I have written a SAS macro that performs Logistic PLS based upon this paper. I like the way it works in these situations, but I don't think my employer would want me to share the macro.

So what should you do? Well, I don't know. There is R code that performs Logistic PLS, if that's something that would help.

--
Paige Miller
Diamond | Level 26

## Re: Stepwise logistic regression

I did suggest that SAS produce a PROC that performs Logistic PLS, but no one has voted for it 😞

https://communities.sas.com/t5/SASware-Ballot-Ideas/Logistic-version-of-PROC-PLS/idi-p/485503

--
Paige Miller
Fluorite | Level 6

## Re: Stepwise logistic regression

That’s a splendid response. Thank you. Now I have to convince a client.

Fluorite | Level 6

## Re: Stepwise logistic regression

If I may pursue this just one more step (poor word choice), only the intercept is in the model when the first predictor is entered, which is immediately removed and the model development terminates.

Diamond | Level 26

## Re: Stepwise logistic regression

@lcmichael_unc wrote:

If I may pursue this just one more step (poor word choice), only the intercept is in the model when the first predictor is entered, which is immediately removed and the model development terminates.

Can I explain everything that STEPWISE does? No, I can't.

--
Paige Miller
Fluorite | Level 6

## Re: Stepwise logistic regression

Humor is an excellent explanation. Thanks.

Super User

## Re: Stepwise logistic regression

@lcmichael_unc wrote:

If I may pursue this just one more step (poor word choice), only the intercept is in the model when the first predictor is entered, which is immediately removed and the model development terminates.

What does the log say?

I would not be surprised to have something that relates to @Reeza's comment about sample size.

Fluorite | Level 6

## Re: Stepwise logistic regression

The log is silent...

NOTE: PROC LOGISTIC is modeling the probability that SVR12=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 0.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied in Step 2.
NOTE: LACKFIT is ignored since there is no explanatory variable in the model.
NOTE: The data set WORK.RSQUARE has 1 observations and 7 variables.
NOTE: The data set WORK.PARAMEST has 6 observations and 9 variables.
NOTE: The data set WORK.MODELINFO has 5 observations and 3 variables.
NOTE: The data set WORK.GOF has 2 observations and 5 variables.
NOTE: The data set WORK.ODDSRAT has 2 observations and 5 variables.
NOTE: The data set WORK.NOBS has 2 observations and 6 variables.
NOTE: There were 174 observations read from the data set FR190301.MITT_GT_VF.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time           0.18 seconds
cpu time            0.14 seconds

...and I wish it were not so.
Diamond | Level 26

## Re: Stepwise logistic regression

@ballardw wrote:

@lcmichael_unc wrote:

If I may pursue this just one more step (poor word choice), only the intercept is in the model when the first predictor is entered, which is immediately removed and the model development terminates.

What does the log say?

I would not be surprised to have something that relates to @Reeza's comment about sample size.

In my opinion, this is a deficiency of the method of stepwise regression, and has nothing to do with sample size.

--
Paige Miller
Fluorite | Level 6

## Re: Stepwise logistic regression

Opinion gratefully noted.
Discussion stats
• 11 replies
• 3712 views
• 2 likes
• 4 in conversation