turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Please provide suggestion for RFM using SAS EG for...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2015 11:29 PM

Hi All,

I've a customer data set with approximately 5 million records and data is collected based on customers from past 10 to 15 years. My target is divide the customers into RFM bins. All the recency, frequency and monitory value variables are extremely skewed.

For example, if I consider recency there are about 60-70% of the customers with recency 1 & 5-10% with recency between 2 to 5 & 2% between 5 to 10 so on..also about 0.1% above 100.

Similar case with monetory value. The monetary varies from 0 to 10,000,000 there are

about 30% of the customers who spent < $5 &

about 20% of the customers who spent between 5 to 10 &

30% b/w 10 to 100 & 20% between 100 to 1000 &

with 10% b/w 100 and 10,000 and

8% b/w 10,000 to 100,000 and so on...about 0.01% > 1,000,000

Similar scenario with Recency variable.

I need to decide how the split should be done. I've access to SAS EG. Any idea or solution is much appreciated.

Thank you so much in advance for your time.

- Avinash

Accepted Solutions

Solution

03-24-2015
12:42 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-24-2015 12:42 PM

Yes, you can get started with Proc Rank. Replace the dataset and 'var' list with your own dataset and variables. If you want to use more or fewer than 5 bins, then adjust the 'groups' value.

**proc rank data=yourData out=ranks groups=5; **

** var mostRecent NumberofPurchases totalamount; **

** ranks recency frequency monetary; **

**run;**

**data ranks;**

** set ranks;**

** recency + 1;**

** frequency + 1;**

** monetary + 1;**

**run; **

**proc univariate data=ranks;**

** var recency frequency monetary;**

**run; **

You might also want to look at the TIES= option for Proc Rank.

This example assumes your data rows represent customers. If they are transactional, then you would need to aggregate the rows before doing this.

As I said, this is just a start. The RFM task in EG 6.1 gives you the option to use transactional- or customer-level data, as well as several binning options and plots.

I hope this helps.

Ray

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-24-2015 09:31 AM

Maybe you should consider computing their percentiles .

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-24-2015 09:40 AM

Hi, Avinash.

If you are concerned that your RFM scores will be skewed, don't worry. The scores are computed based off percentiles (ranks), so even with skewed data (which are the norm with variables like total purchase amount) you will get a pretty uniform distribution of R, F, and M scores.

And just in case you hadn't seen it, EG has an RFM task that will make getting RFM scores very easy. Look under Tasks > Data Mining > Recency, Frequency, Monetary.

I hope this helps.

Ray

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-24-2015 12:16 PM

Hi Xia Keshan / Ray Wright,

Thank you for your response. Unfortunately, I've SAS EG 5.1 which doesn't have the option to perform RFM scores directly. Can you suggest any other way to do this?

Solution

03-24-2015
12:42 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-24-2015 12:42 PM

Yes, you can get started with Proc Rank. Replace the dataset and 'var' list with your own dataset and variables. If you want to use more or fewer than 5 bins, then adjust the 'groups' value.

**proc rank data=yourData out=ranks groups=5; **

** var mostRecent NumberofPurchases totalamount; **

** ranks recency frequency monetary; **

**run;**

**data ranks;**

** set ranks;**

** recency + 1;**

** frequency + 1;**

** monetary + 1;**

**run; **

**proc univariate data=ranks;**

** var recency frequency monetary;**

**run; **

You might also want to look at the TIES= option for Proc Rank.

This example assumes your data rows represent customers. If they are transactional, then you would need to aggregate the rows before doing this.

As I said, this is just a start. The RFM task in EG 6.1 gives you the option to use transactional- or customer-level data, as well as several binning options and plots.

I hope this helps.

Ray

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-24-2015 12:48 PM

Thank you Ray for your clear explanation. I believe this is a very good starting point. I'll start experimenting with the number of bins.

-Avinash