Solved: How to prepare training data to predict churn when contacts have varyi...

dcortell

Problem statement
I need to model contact inactivation, defined as a contact having 12 consecutive months with no touchpoints. At any given scoring date, contacts in the base can have different amounts of accumulated inactivity (for example, 1 month, 5 months, 8 months, etc.), i.e. they are partway toward the 12-month churn threshold.
Objective
I want the model to score contacts at any time and estimate the probability they will reach 12 months of inactivity within the next 12 months (or, equivalently, to churn within the next 12 months).
Proposed approach and question
I’m considering creating a dataset of snapshots (one per contact per prediction date), with a continuous feature “months_inactive_so_far” (N) and other historical features computed up to that snapshot. The label would be whether the contact reaches 12 months of inactivity within the subsequent 12 months. My question: Is this a reasonable way to prepare the training set, or are there better/principled alternatives (e.g., survival analysis or different labeling strategies)? Are there pitfalls I should watch for (censoring, leakage, splitting training/test by contact or time)? Any references or practical experience would be appreciated.
Additional details that may help answerers (add if relevant)
- Are snapshots monthly, weekly, or event-driven?
- Do you have full 12 months of follow-up for all snapshots, or are recent snapshots right-censored?
- Do you need a single probability (will churn in next 12 months) or a time-to-event estimate (when will churn occur)?

SASKiwi

In my experience you need to have customers in your data you can identify as having actually churned (become inactive). That means you will probably need at least two years of customer history. I would start by creating a churn flag with the first year of data by going through each month and flagging those customers who churned in the following 12 months.

By identifying actual churners you can use this indicator, which beomes the predictor, to train your model.

View solution in original post

SASKiwi

In my experience you need to have customers in your data you can identify as having actually churned (become inactive). That means you will probably need at least two years of customer history. I would start by creating a churn flag with the first year of data by going through each month and flagging those customers who churned in the following 12 months.

By identifying actual churners you can use this indicator, which beomes the predictor, to train your model.

dcortell

Sorry, for some reason the button accept as solution activated but actually the reply didn’t provide a solution to the question. The question is more related to the idea of IF it is a robust practice to build a churn model on a training set with different window of inactive months and predict if that contact will get inactive in the following 12 months (keeping the definition that a contact becomes inactive if cumulate 12 months rolling of inactivity. Meaning that a contact with 11 months of inactivity cumulative will almost 100% become inactive in the following 12 months (it just need an additional month of inactivity) vs one that has only 2 months not inactivity when modelled. ) so the question is if using different cutoff of inactivity for the same model (or the same using model of inactivity as predictive variable) is a robust approach.

SASKiwi

I suspect the question is: how many months of inactivity results in permanent inactivity for the majority of customers? You want the training data to accurately reflect churn reality, that is permanent inactivity. You will need to do some analytics on your data to find out what is the inactivity period spread for most churners.

dcortell

I’m not sure I agree on the need to identify permanent inactivity for two
reasons. One , a contact inactivity for 12 months is pretty much a contact
that 95% of the time is a contact that never activate anymore and is
cancelled from databases. Second, if my contact is getting toward the 12
months inactivity with good probability and marketing continue to bombing
it as it increasingly that probability I want to identify that asap as to
stop policy and create some kind of healing process ##- Please type your
reply above this line. No attachments. -##

SASKiwi

It is entirely up to you what your definition of churn is and how many months of inactivity you want to base your churn indicator on. I don't have your data so I can't really provide any guidance on this. What you are suggesting makes sense.

How to prepare training data to predict churn when contacts have varying months of pre-inactivation?

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

How to prepare training data to predict churn when contacts have varying months of pre-inactivation?

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Re: How to prepare training data to predict churn when contacts have varying months of pre-inactivat

Registration is open