BookmarkSubscribeRSS Feed
NicolasC
Fluorite | Level 6

HI there. I am facing a problem for a marketing application. Basically, we send customers a product and they either buy x units of them or they don't. I am trying to build a predictive scoring model with the sales generated as the variable to explain. Because the response rate of such a campaign is roughly 2%, it means I am trying to build a regressor with 98% of 0 and the rest being positive sales. I have separately developed a classification model that works well in determining whether a cust. will buy or not, but with the one I am trying to build, I'd like to score the customers from 1-10 depending on which decile of the predictive sales they will generate. I feel like I am not approaching the problem the right way (with this regression) since, in a same way a classifier will be biased by the predominance of 0, my regressor (even with resampling) will be biased by all the zero sales. Any ideas? Many thanks in advance. Nicolas

2 REPLIES 2
WendyCzika
SAS Employee

One thing you could try is the TwoStage model node.  You need to define 2 target variables for that: 1 as a binary (bought or not), and the other represents the number of units for those that bought (missing for those that didn't).  Then you can do a Sequential model using Filter=Non-Events (need to define the event for the binary target as those that did not buy).  

 

NicolasC
Fluorite | Level 6

Hi Wendy. Thanks for your reply. I am not familiar with the two-stage model node. In the current process, I use the 'unbalanced' regression and test it and it seems fine (in terms of scoring there is a hierarchy - the top 10% (em_segment = 1) have a sum for sales higher than those in em_segment 2, etc...). Yet, I am convinced I can do better than that. 

What I tried as well is having a response model (model built on responders and not responders) and a sales model (regression only on the buyers) and trying to combine those two by applying the regression on the predicted buyers from the response model. I tested it (as before, on a more recent campaign) and in this case, it does not work.

 

Does the two-stage model work in a somehow similar way as I just described? When applying the score created for this two-stage model, does it predict the target from the regression (sales) or the classification (response)? Since I have unbalanced data, I assume the overall process before using this node is the same as before (sampling+treatment of missing values+data partition)?? Thanks for your help.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 979 views
  • 0 likes
  • 2 in conversation