Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Please provide suggestion for RFM using SAS EG for extremely skewed distribution.

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

Please provide suggestion for RFM using SAS EG for extremely skewed distribution.

Hi All,

I've a customer data set with approximately 5 million records and data is collected based on customers from past 10 to 15 years. My target is divide the customers into RFM bins. All the recency, frequency and monitory value variables are extremely skewed.

For example, if I consider recency there are about 60-70% of the customers with recency 1 & 5-10% with recency between 2 to 5 & 2% between 5 to 10 so on..also about 0.1% above 100.

Similar case with monetory value. The monetary varies from 0 to 10,000,000 there are

about 30% of the customers who spent < $5 &

about 20% of the customers who spent between 5 to 10 &

30% b/w 10 to 100 & 20% between 100 to 1000 &

with 10% b/w 100 and 10,000 and

8% b/w 10,000 to 100,000 and so on...about 0.01% > 1,000,000


Similar scenario with Recency variable.


I need to decide how the split should be done. I've access to SAS EG. Any idea or solution is much appreciated.


Thank you so much in advance for your time.


- Avinash



Accepted Solutions
Solution
‎03-24-2015 12:42 PM
SAS Employee
Posts: 106

Re: Please provide suggestion for RFM using SAS EG for extremely skewed distribution.

Yes, you can get started with Proc Rank. Replace the dataset and  'var' list with your own dataset and variables. If you want to use more or fewer than 5 bins, then adjust the 'groups' value.

proc rank data=yourData out=ranks groups=5;

  var mostRecent NumberofPurchases totalamount;

  ranks recency frequency monetary;

run;

data ranks;

  set ranks;

  recency  + 1;

    frequency + 1;

  monetary + 1;

run;

proc univariate data=ranks;

    var recency frequency monetary;

run;

You might also want to look at the TIES= option for Proc Rank.

This example assumes your data rows represent customers. If they are transactional, then you would need to aggregate the rows before doing this.

As I said, this is just a start. The RFM task in EG 6.1 gives you the option to use transactional- or customer-level data, as well as several binning options and plots.

I hope this helps.

Ray

View solution in original post


All Replies
Super User
Posts: 9,676

Re: Please provide suggestion for RFM using SAS EG for extremely skewed distribution.

Maybe you should consider computing their  percentiles .

SAS Employee
Posts: 106

Re: Please provide suggestion for RFM using SAS EG for extremely skewed distribution.

Hi, Avinash.

If you are concerned that your RFM scores will be skewed, don't worry. The scores are computed based off percentiles (ranks), so even with skewed data (which are the norm with variables like total purchase amount) you will get a pretty uniform distribution of R, F, and M scores.

And just in case you hadn't seen it, EG has an RFM task that will make getting RFM scores very easy. Look under Tasks > Data Mining > Recency, Frequency, Monetary.

I hope this helps.

Ray

Occasional Contributor
Posts: 5

Re: Please provide suggestion for RFM using SAS EG for extremely skewed distribution.

Hi Xia Keshan / Ray Wright,

Thank you for your response. Unfortunately, I've SAS EG 5.1 which doesn't have the option to perform RFM scores directly. Can you suggest any other way to do this?

Solution
‎03-24-2015 12:42 PM
SAS Employee
Posts: 106

Re: Please provide suggestion for RFM using SAS EG for extremely skewed distribution.

Yes, you can get started with Proc Rank. Replace the dataset and  'var' list with your own dataset and variables. If you want to use more or fewer than 5 bins, then adjust the 'groups' value.

proc rank data=yourData out=ranks groups=5;

  var mostRecent NumberofPurchases totalamount;

  ranks recency frequency monetary;

run;

data ranks;

  set ranks;

  recency  + 1;

    frequency + 1;

  monetary + 1;

run;

proc univariate data=ranks;

    var recency frequency monetary;

run;

You might also want to look at the TIES= option for Proc Rank.

This example assumes your data rows represent customers. If they are transactional, then you would need to aggregate the rows before doing this.

As I said, this is just a start. The RFM task in EG 6.1 gives you the option to use transactional- or customer-level data, as well as several binning options and plots.

I hope this helps.

Ray

Occasional Contributor
Posts: 5

Re: Please provide suggestion for RFM using SAS EG for extremely skewed distribution.

Thank you Ray for your clear explanation. I believe this is a very good starting point. I'll start experimenting with the number of bins.

-Avinash

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 657 views
  • 5 likes
  • 3 in conversation