Hi all I am using predictive modelling to predict which customers to send out email newsletters to. I am using e.g. decision trees, logistic regression etc. I have some datasets where all the customers (one ID per customer) have received a newsletter, which means I can do a simple random sample based on this newsletter. However, I also have datasets where the newsletter where sent out to only a subset of customers based on a predefined affinity logic. How would you deal with such a dataset? Is it even possible to use it for predictive modelling as it will not have a random representation of people who are interested in the newsletter and people who are not. Second, how would I keep validating a model that has been created? Because the data I will now have is going to be non-random as it is based upon the model I have created. When I then want to improve the model to validate whether the model is still stable or should be changed, I guess I need to sent out the newsletter to a random selection of those who otherwise are predicted to not receive the newsletter? Thanks a lot for your help in advance, Maja
... View more