BookmarkSubscribeRSS Feed
SC_1991
Fluorite | Level 6

I am developing a Logistic regression model for predicting complaints in a customer service set-up.

Consider a credit card company scenario, the objective here is to identify the customer who is most likely to complain n the next week or so, based on events that happened in the past couple of months. the attributes can be change in payment behavior, filed for bankruptcy y/n etc.

I had almost 200 independent variables in my data. the target variable is binary.

I ran  proc logistic for every individual variable against the target variable and calculated the c-scores.

I then chose the variables with c-scores over 0.525. I ran proc logistic again with just these variables (all together now) and chose my final predicting variables. I used forward stepwise selection this time.

 

My question is what is wrong in following this approach. I am worried about losing information by short listing the variables based on c-scores. I am following this methodology as that is what I have been told. Any word of advice would be greatly appreciated.

4 REPLIES 4
PaigeMiller
Diamond | Level 26

What is wrong with this approach?

 

Go to your favorite search engine and type in "problems with stepwise regression". 

 

A better approach, in my opinion, is to use a logistic version of Partial Least Squares Regression. Again, your favorite search engine will find examples. PROC PLS in SAS does the calculations.

--
Paige Miller
SC_1991
Fluorite | Level 6
What about the way I shortlisted the variables based on single variate c-scores and used only the shortlisted variables in running my logistic regression?
PaigeMiller
Diamond | Level 26

It ignores the multi-variate nature of your predictor variables ... it ignores the correlations between the predictors, it ignores interactions between predictors. If you are going to fit a model with multiple predictors, I am skeptical of the approach of looking at predictor variables one at a time.

--
Paige Miller
PGStats
Opal | Level 21

You might want to look at decision tree models from proc HpSplit for this kind of  problem. It will give you a better view of your data and might even provide you with a predictive model.

PG

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1932 views
  • 0 likes
  • 3 in conversation