BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AvinashRdy
Calcite | Level 5

Hi All,

I've a customer data set with approximately 5 million records and data is collected based on customers from past 10 to 15 years. My target is divide the customers into RFM bins. All the recency, frequency and monitory value variables are extremely skewed.

For example, if I consider recency there are about 60-70% of the customers with recency 1 & 5-10% with recency between 2 to 5 & 2% between 5 to 10 so on..also about 0.1% above 100.

Similar case with monetory value. The monetary varies from 0 to 10,000,000 there are

about 30% of the customers who spent < $5 &

about 20% of the customers who spent between 5 to 10 &

30% b/w 10 to 100 & 20% between 100 to 1000 &

with 10% b/w 100 and 10,000 and

8% b/w 10,000 to 100,000 and so on...about 0.01% > 1,000,000


Similar scenario with Recency variable.


I need to decide how the split should be done. I've access to SAS EG. Any idea or solution is much appreciated.


Thank you so much in advance for your time.


- Avinash


1 ACCEPTED SOLUTION

Accepted Solutions
rayIII
SAS Employee

Yes, you can get started with Proc Rank. Replace the dataset and  'var' list with your own dataset and variables. If you want to use more or fewer than 5 bins, then adjust the 'groups' value.

proc rank data=yourData out=ranks groups=5;

  var mostRecent NumberofPurchases totalamount;

  ranks recency frequency monetary;

run;

data ranks;

  set ranks;

  recency  + 1;

    frequency + 1;

  monetary + 1;

run;

proc univariate data=ranks;

    var recency frequency monetary;

run;

You might also want to look at the TIES= option for Proc Rank.

This example assumes your data rows represent customers. If they are transactional, then you would need to aggregate the rows before doing this.

As I said, this is just a start. The RFM task in EG 6.1 gives you the option to use transactional- or customer-level data, as well as several binning options and plots.

I hope this helps.

Ray

View solution in original post

5 REPLIES 5
Ksharp
Super User

Maybe you should consider computing their  percentiles .

rayIII
SAS Employee

Hi, Avinash.

If you are concerned that your RFM scores will be skewed, don't worry. The scores are computed based off percentiles (ranks), so even with skewed data (which are the norm with variables like total purchase amount) you will get a pretty uniform distribution of R, F, and M scores.

And just in case you hadn't seen it, EG has an RFM task that will make getting RFM scores very easy. Look under Tasks > Data Mining > Recency, Frequency, Monetary.

I hope this helps.

Ray

AvinashRdy
Calcite | Level 5

Hi Xia Keshan / Ray Wright,

Thank you for your response. Unfortunately, I've SAS EG 5.1 which doesn't have the option to perform RFM scores directly. Can you suggest any other way to do this?

rayIII
SAS Employee

Yes, you can get started with Proc Rank. Replace the dataset and  'var' list with your own dataset and variables. If you want to use more or fewer than 5 bins, then adjust the 'groups' value.

proc rank data=yourData out=ranks groups=5;

  var mostRecent NumberofPurchases totalamount;

  ranks recency frequency monetary;

run;

data ranks;

  set ranks;

  recency  + 1;

    frequency + 1;

  monetary + 1;

run;

proc univariate data=ranks;

    var recency frequency monetary;

run;

You might also want to look at the TIES= option for Proc Rank.

This example assumes your data rows represent customers. If they are transactional, then you would need to aggregate the rows before doing this.

As I said, this is just a start. The RFM task in EG 6.1 gives you the option to use transactional- or customer-level data, as well as several binning options and plots.

I hope this helps.

Ray

AvinashRdy
Calcite | Level 5

Thank you Ray for your clear explanation. I believe this is a very good starting point. I'll start experimenting with the number of bins.

-Avinash

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2095 views
  • 5 likes
  • 3 in conversation