BookmarkSubscribeRSS Feed
SC_1991
Fluorite | Level 6

I am developing a Logistic regression model for predicting complaints in a customer service set-up.

Consider a credit card company scenario, the objective here is to identify the customer who is most likely to complain n the next week or so, based on events that happened in the past couple of months. the attributes can be change in payment behavior, filed for bankruptcy y/n etc.

I had almost 200 independent variables in my data. the target variable is binary.

I ran  proc logistic for every individual variable against the target variable and calculated the c-scores.

I then chose the variables with c-scores over 0.525. I ran proc logistic again with just these variables (all together now) and chose my final predicting variables. I used forward stepwise selection this time.

 

My question is what is wrong in following this approach. I am worried about losing information by short listing the variables based on c-scores. I am following this methodology as that is what I have been told. Any word of advice would be greatly appreciated.

4 REPLIES 4
PaigeMiller
Diamond | Level 26

What is wrong with this approach?

 

Go to your favorite search engine and type in "problems with stepwise regression". 

 

A better approach, in my opinion, is to use a logistic version of Partial Least Squares Regression. Again, your favorite search engine will find examples. PROC PLS in SAS does the calculations.

--
Paige Miller
SC_1991
Fluorite | Level 6
What about the way I shortlisted the variables based on single variate c-scores and used only the shortlisted variables in running my logistic regression?
PaigeMiller
Diamond | Level 26

It ignores the multi-variate nature of your predictor variables ... it ignores the correlations between the predictors, it ignores interactions between predictors. If you are going to fit a model with multiple predictors, I am skeptical of the approach of looking at predictor variables one at a time.

--
Paige Miller
PGStats
Opal | Level 21

You might want to look at decision tree models from proc HpSplit for this kind of  problem. It will give you a better view of your data and might even provide you with a predictive model.

PG

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1212 views
  • 0 likes
  • 3 in conversation