Re: Logistic regression variable selection

Vaithi · Posted 03-08-2019 08:11 AM

fit nominal logistic regression has a R square of 60.73, in the Effect likelihood ratio tests, certain variables have ProbChisquare value of greater than 0.05, although the overall model is shown as significant. Can I take all the variables for modelling or limit it only variables with Chisquare probability of <0.05 and what are the other data points that i can use to make the decision on the variable selection

Few of the variables change significance level every time I add a new variable, what was significant earlier end up being as non significant. I am not able to decide which variables to take for model building. Any help in this regard will be appreciated.

Following are the results of the test

Whole Model Test

ProbChiSq <0.0001*

R Square 0.6073

Variable Chisquare ProbChiSq

1 159 <0.0001*

2 18.75 0.0021*

3 8.53 0.3830

4 7.67 0.0215*

5 55.10 0.3203

PaigeMiller · Posted 03-08-2019 08:20 AM

@Vaithi wrote:

Few of the variables change significance level every time I add a new variable, what was significant earlier end up being as non significant. I am not able to decide which variables to take for model building. Any help in this regard will be appreciated.

Unfortunately, this is a "feature" (although I would say "drawback") to variable selection in Logistic Regression and in Ordinary Least Squares Regression. Not only the significance of a variable can change, the estimate (slope) can change greatly as well, sometimes even so much as to have a different sign, when a variable is added or removed from a regression. It is extremely frustrating when this happens, and this is why I avoid variable selection methods.

So, I recommend using Partial Least Squares regression (PROC PLS), which does not have this feature/drawback, and no variable selection is performed, all variables of interest go into the model and variables which have low or no predictive ability get weights close to zero. Unfortunately, there is no Logistic Partial Least Squares regression available in SAS, although a paper has been written describing the algorithm (and the algorithm also exists in R).

--
Paige Miller

BrettWujek · Posted 03-08-2019 10:34 AM

To me, this is a possible indication that you have some highly correlated inputs such that the reported effects are shared among them. Have you done any correlation analysis on the inputs?

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Logistic regression variable selection

Re: Logistic regression variable selection

Re: Logistic regression variable selection

Registration is open