BookmarkSubscribeRSS Feed
Vaithi
Calcite | Level 5

fit nominal logistic regression has a R square of 60.73, in the Effect likelihood ratio tests, certain variables have ProbChisquare value of greater than 0.05, although the overall model is shown as significant.  Can I take all the variables for modelling or limit it only variables with Chisquare probability of <0.05 and what are the other data points that i can use to make the decision on the variable selection

 

 

Few of the variables change significance level every time I add a new variable, what was significant earlier end up being as non significant.  I am not able to decide which variables to take for model building.  Any help in this regard will be appreciated.

 

Following are the results of the test

 

Whole Model Test

ProbChiSq <0.0001*

R Square 0.6073

 

 

Variable             Chisquare            ProbChiSq

1                            159                        <0.0001*

2                            18.75                    0.0021*

3                            8.53                      0.3830

4                            7.67                      0.0215*

5                            55.10                    0.3203

 

 

                        

2 REPLIES 2
PaigeMiller
Diamond | Level 26

@Vaithi wrote:

Few of the variables change significance level every time I add a new variable, what was significant earlier end up being as non significant.  I am not able to decide which variables to take for model building.  Any help in this regard will be appreciated.

                        


Unfortunately, this is a "feature" (although I would say "drawback") to variable selection in Logistic Regression and in Ordinary Least Squares Regression. Not only the significance of a variable can change, the estimate (slope) can change greatly as well, sometimes even so much as to have a different sign, when a variable is added or removed from a regression. It is extremely frustrating when this happens, and this is why I avoid variable selection methods.

 

So, I recommend using Partial Least Squares regression (PROC PLS), which does not have this feature/drawback, and no variable selection is performed, all variables of interest go into the model and variables which have low or no predictive ability get weights close to zero. Unfortunately, there is no Logistic Partial Least Squares regression available in SAS, although a paper has been written describing the algorithm (and the algorithm also exists in R).

--
Paige Miller
BrettWujek
SAS Employee

To me, this is a possible indication that you have some highly correlated inputs such that the reported effects are shared among them. Have you done any correlation analysis on the inputs?


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 594 views
  • 0 likes
  • 3 in conversation