HI there. I am facing a problem for a marketing application. Basically, we send customers a product and they either buy x units of them or they don't. I am trying to build a predictive scoring model with the sales generated as the variable to explain. Because the response rate of such a campaign is roughly 2%, it means I am trying to build a regressor with 98% of 0 and the rest being positive sales. I have separately developed a classification model that works well in determining whether a cust. will buy or not, but with the one I am trying to build, I'd like to score the customers from 1-10 depending on which decile of the predictive sales they will generate. I feel like I am not approaching the problem the right way (with this regression) since, in a same way a classifier will be biased by the predominance of 0, my regressor (even with resampling) will be biased by all the zero sales. Any ideas? Many thanks in advance. Nicolas
One thing you could try is the TwoStage model node. You need to define 2 target variables for that: 1 as a binary (bought or not), and the other represents the number of units for those that bought (missing for those that didn't). Then you can do a Sequential model using Filter=Non-Events (need to define the event for the binary target as those that did not buy).
Hi Wendy. Thanks for your reply. I am not familiar with the two-stage model node. In the current process, I use the 'unbalanced' regression and test it and it seems fine (in terms of scoring there is a hierarchy - the top 10% (em_segment = 1) have a sum for sales higher than those in em_segment 2, etc...). Yet, I am convinced I can do better than that.
What I tried as well is having a response model (model built on responders and not responders) and a sales model (regression only on the buyers) and trying to combine those two by applying the regression on the predicted buyers from the response model. I tested it (as before, on a more recent campaign) and in this case, it does not work.
Does the two-stage model work in a somehow similar way as I just described? When applying the score created for this two-stage model, does it predict the target from the regression (sales) or the classification (response)? Since I have unbalanced data, I assume the overall process before using this node is the same as before (sampling+treatment of missing values+data partition)?? Thanks for your help.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.