fit nominal logistic regression has a R square of 60.73, in the Effect likelihood ratio tests, certain variables have ProbChisquare value of greater than 0.05, although the overall model is shown as significant. Can I take all the variables for modelling or limit it only variables with Chisquare probability of <0.05 and what are the other data points that i can use to make the decision on the variable selection
Few of the variables change significance level every time I add a new variable, what was significant earlier end up being as non significant. I am not able to decide which variables to take for model building. Any help in this regard will be appreciated.
Following are the results of the test
Whole Model Test
ProbChiSq <0.0001*
R Square 0.6073
Variable Chisquare ProbChiSq
1 159 <0.0001*
2 18.75 0.0021*
3 8.53 0.3830
4 7.67 0.0215*
5 55.10 0.3203
@Vaithi wrote:
Few of the variables change significance level every time I add a new variable, what was significant earlier end up being as non significant. I am not able to decide which variables to take for model building. Any help in this regard will be appreciated.
Unfortunately, this is a "feature" (although I would say "drawback") to variable selection in Logistic Regression and in Ordinary Least Squares Regression. Not only the significance of a variable can change, the estimate (slope) can change greatly as well, sometimes even so much as to have a different sign, when a variable is added or removed from a regression. It is extremely frustrating when this happens, and this is why I avoid variable selection methods.
So, I recommend using Partial Least Squares regression (PROC PLS), which does not have this feature/drawback, and no variable selection is performed, all variables of interest go into the model and variables which have low or no predictive ability get weights close to zero. Unfortunately, there is no Logistic Partial Least Squares regression available in SAS, although a paper has been written describing the algorithm (and the algorithm also exists in R).
To me, this is a possible indication that you have some highly correlated inputs such that the reported effects are shared among them. Have you done any correlation analysis on the inputs?
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.