BookmarkSubscribeRSS Feed
SC_1991
Fluorite | Level 6

I am developing a Logistic regression model for predicting complaints in a customer service set-up.

Consider a credit card company scenario, the objective here is to identify the customer who is most likely to complain n the next week or so, based on events that happened in the past couple of months. the attributes can be change in payment behavior, filed for bankruptcy y/n etc.

I had almost 200 independent variables in my data. the target variable is binary.

I ran  proc logistic for every individual variable against the target variable and calculated the c-scores.

I then chose the variables with c-scores over 0.525. I ran proc logistic again with just these variables (all together now) and chose my final predicting variables. I used forward stepwise selection this time.

 

My question is what is wrong in following this approach. I am worried about losing information by short listing the variables based on c-scores. I am following this methodology as that is what I have been told. Any word of advice would be greatly appreciated.

4 REPLIES 4
PaigeMiller
Diamond | Level 26

What is wrong with this approach?

 

Go to your favorite search engine and type in "problems with stepwise regression". 

 

A better approach, in my opinion, is to use a logistic version of Partial Least Squares Regression. Again, your favorite search engine will find examples. PROC PLS in SAS does the calculations.

--
Paige Miller
SC_1991
Fluorite | Level 6
What about the way I shortlisted the variables based on single variate c-scores and used only the shortlisted variables in running my logistic regression?
PaigeMiller
Diamond | Level 26

It ignores the multi-variate nature of your predictor variables ... it ignores the correlations between the predictors, it ignores interactions between predictors. If you are going to fit a model with multiple predictors, I am skeptical of the approach of looking at predictor variables one at a time.

--
Paige Miller
PGStats
Opal | Level 21

You might want to look at decision tree models from proc HpSplit for this kind of  problem. It will give you a better view of your data and might even provide you with a predictive model.

PG

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1152 views
  • 0 likes
  • 3 in conversation