I am developing a Logistic regression model for predicting complaints in a customer service set-up.
Consider a credit card company scenario, the objective here is to identify the customer who is most likely to complain n the next week or so, based on events that happened in the past couple of months. the attributes can be change in payment behavior, filed for bankruptcy y/n etc.
I had almost 200 independent variables in my data. the target variable is binary.
I ran proc logistic for every individual variable against the target variable and calculated the c-scores.
I then chose the variables with c-scores over 0.525. I ran proc logistic again with just these variables (all together now) and chose my final predicting variables. I used forward stepwise selection this time.
My question is what is wrong in following this approach. I am worried about losing information by short listing the variables based on c-scores. I am following this methodology as that is what I have been told. Any word of advice would be greatly appreciated.
What is wrong with this approach?
Go to your favorite search engine and type in "problems with stepwise regression".
A better approach, in my opinion, is to use a logistic version of Partial Least Squares Regression. Again, your favorite search engine will find examples. PROC PLS in SAS does the calculations.
It ignores the multi-variate nature of your predictor variables ... it ignores the correlations between the predictors, it ignores interactions between predictors. If you are going to fit a model with multiple predictors, I am skeptical of the approach of looking at predictor variables one at a time.
You might want to look at decision tree models from proc HpSplit for this kind of problem. It will give you a better view of your data and might even provide you with a predictive model.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.