BookmarkSubscribeRSS Feed
Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

I am building a churn model for a groceries chain and got a bit lost.

 

I only done churn modelling where churn is clearly defined as it is usually in a contractual setting so some question I ask is embarrassingly simple.

 

Let say my target population is customers who shops at our stores during 2016, should I classify churn as:

1) Customers who no longer shopped with the companies in 2017.

2) Customers who no longer shopped with the companies 1 years after their last purchases.  

 

Also, how should I prepare the transaction regarding refunds, the data i have access to provide no easy way to match a refund back to the original purchases, the only way I can do it is to match it by product_ID and prices, but this will be time consuming and messy due to coupon, discount and sales clerk error.

 

The easiest way to do will be to just sum up all the transaction during the defined time frame and subtract the refunded amount, but there will be a mismatch as original purchases or the refund can belongs to difference time frame.

 

The other way will be to only include refunds that can be matched to a purchase during the time frame which should give me much better data but.

 

Or should i just have the purchases and refund into separate predictor variables?

 

Also, if I want to test the model with new data, will it create any type of bias if I define the target population as customers who shopped between Jan 1, 2017 to Apr 30, 2017.  And define churn as customer who haven't shopped during May 1, 2017 to Apr 30, 2018.

 

 

Thanks.

 

 

3 REPLIES 3
Reeza
Super User

This is complex....and important so if you have a client you're doing this for, you should definitely discuss with them. 

 

Some things to look into first:

 

  • What percentage of sales are returned - is it materially an issue that you need to be concerned about. 
  • For feature creation, I would consider a flag of how many returns a users has or the total amount. You're ultimate goal is the prediction so you don't necessarily have to have the exact values just be consistent and clear with how you're dealing with the issues. Total sales can be a different metric. 
  • Rather than specific time frames for follow up, consider a survival approach, which looks at time to next purchase as a metric or time to churn, where churn is defined as:
    • 6 months with no purchase
    • 12 months with no purchase
  • Is it really fair to break your periods up into specific intervals because someone who purchased a product on April 28th is considered the same as someone who did on January 1st? It's usual to do this as cohort analysis, ie group different cohorts together that usually line up with some business time periods, so first/second quarter and time to churn definition can vary. I would actually check with it one or two definitions to check the metrics because it's the most important part of your analysis. Yes, this is contrary to the first sentence but more of something you should think about. 

 

I don't think this answers all your questions, but I hope it helps. 

 

 

 

Fae
Obsidian | Level 7 Fae
Obsidian | Level 7

Thanks very much for your advice.

 

>What percentage of sales are returned - is it materially an issue that you need to be concerned about. 

Probably not materially on a aggregated level, but on a individual customer level, I do worry that it will affect the model accuracy. 

 

>Rather than specific time frames for follow up, consider a survival approach, which looks at time to next purchase as a metric or time to churn, where churn is defined as:

For survival analysis, what methodology would you recommend.

 

>Is it really fair to break your periods up into specific intervals because someone who purchased a product on April 28th is considered the same as someone who did on January 1st?

It's not fair and I already discussed this with the client before the project started, but this is what's requested because it's what the executives want and how the business been ran.  They are not going to change their entire operation because of my work.

Reeza
Super User

@Fae wrote:

Thanks very much for your advice.

 

>What percentage of sales are returned - is it materially an issue that you need to be concerned about. 

Probably not materially on a aggregated level, but on a individual customer level, I do worry that it will affect the model accuracy. 

 

 


Then test that! You can simulate a few different scenarios to understand how the parameters will change accordingly. You can build several features and see which works better, total sales, total refunds, total sales-total refunds in the period and you could also try and match the records using FIFO approach or if you understand the business rules, ie 90 days return policy means that anything after 90 days wouldn't be returned. This may not also be followed strictly but it's a start.

 



 

 

>Is it really fair to break your periods up into specific intervals because someone who purchased a product on April 28th is considered the same as someone who did on January 1st?

It's not fair and I already discussed this with the client before the project started, but this is what's requested because it's what the executives want and how the business been ran.  They are not going to change their entire operation because of my work.


That's why I said it was something to think about, not necessarily change. 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1311 views
  • 0 likes
  • 2 in conversation