I am developing a Logistic regression model for predicting complaints in a customer service set-up.
Consider a credit card company scenario, the objective here is to identify the customer who is most likely to complain n the next week or so, based on events that happened in the past couple of months. the attributes can be change in payment behavior, filed for bankruptcy y/n etc.
I had almost 200 independent variables in my data. the target variable is binary.
I ran proc logistic for every individual variable against the target variable and calculated the c-scores.
I then chose the variables with c-scores over 0.525. I ran proc logistic again with just these variables (all together now) and chose my final predicting variables. I used forward stepwise selection this time.
My question is what is wrong in following this approach. I am worried about losing information by short listing the variables based on c-scores. I am following this methodology as that is what I have been told. Any word of advice would be greatly appreciated.
What is wrong with this approach?
Go to your favorite search engine and type in "problems with stepwise regression".
A better approach, in my opinion, is to use a logistic version of Partial Least Squares Regression. Again, your favorite search engine will find examples. PROC PLS in SAS does the calculations.
It ignores the multi-variate nature of your predictor variables ... it ignores the correlations between the predictors, it ignores interactions between predictors. If you are going to fit a model with multiple predictors, I am skeptical of the approach of looking at predictor variables one at a time.
You might want to look at decision tree models from proc HpSplit for this kind of problem. It will give you a better view of your data and might even provide you with a predictive model.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.