Rare event is par for the course in business applications like acquisition, attrition, cross-sell, credit risk, fraud,... etc. All of them have low incidence rates, often below 1%. Becoming comfortable with these situations and developing startegy to deal them is quite important. The reference suggested by Doc is a good starting point. They typically revolve around stratified sampling, un-weighted regressions, + adjustment for stratified sampling. For logistic regression, the adjustment affects only the intercept term, not other coefficients. stratified sampling, weighted regression, no adjustment build model as is with unbalanced class distribution In practice, there are people that favor one method or another, for business though, that argument is moot. What's worth money to businesses is consistent ability to provide high degree of differentiation above all else. And often it is necessary to modify the performance criteria, for example, there may only be capacity to investigate a small number of cases. What matters then are not global performance metrics such as AUROC, KS,... instead, attention is paid only to performance at the very tip of the model. At the end of the day, you use statistics to get what you want, rather than letting statistics control what you do. Real business decisions often extend beyond statistics.
... View more