topic regression in SAS Data Science

regression

NicolasC — Fri, 25 May 2018 08:34:11 GMT

HI there. I am facing a problem for a marketing application. Basically, we send customers a product and they either buy x units of them or they don't. I am trying to build a predictive scoring model with the sales generated as the variable to explain. Because the response rate of such a campaign is roughly 2%, it means I am trying to build a regressor with 98% of 0 and the rest being positive sales. I have separately developed a classification model that works well in determining whether a cust. will buy or not, but with the one I am trying to build, I'd like to score the customers from 1-10 depending on which decile of the predictive sales they will generate. I feel like I am not approaching the problem the right way (with this regression) since, in a same way a classifier will be biased by the predominance of 0, my regressor (even with resampling) will be biased by all the zero sales. Any ideas? Many thanks in advance. Nicolas

Re: regression

WendyCzika — Fri, 25 May 2018 17:19:03 GMT

One thing you could try is the TwoStage model node. You need to define 2 target variables for that: 1 as a binary (bought or not), and the other represents the number of units for those that bought (missing for those that didn't). Then you can do a Sequential model using Filter=Non-Events (need to define the event for the binary target as those that did not buy).

Re: regression

NicolasC — Sun, 27 May 2018 20:28:09 GMT

Hi Wendy. Thanks for your reply. I am not familiar with the two-stage model node. In the current process, I use the 'unbalanced' regression and test it and it seems fine (in terms of scoring there is a hierarchy - the top 10% (em_segment = 1) have a sum for sales higher than those in em_segment 2, etc...). Yet, I am convinced I can do better than that.

What I tried as well is having a response model (model built on responders and not responders) and a sales model (regression only on the buyers) and trying to combine those two by applying the regression on the predicted buyers from the response model. I tested it (as before, on a more recent campaign) and in this case, it does not work.

Does the two-stage model work in a somehow similar way as I just described? When applying the score created for this two-stage model, does it predict the target from the regression (sales) or the classification (response)? Since I have unbalanced data, I assume the overall process before using this node is the same as before (sampling+treatment of missing values+data partition)?? Thanks for your help.