turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Proc Cluster

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-16-2013 09:08 AM

Hi all,

I have a table with millions of product level transactions and wanted to know if there are clusters of products in my table which are often purchased together.

I’m using proc distance (method=Euclid), proc cluster (method=Ward) and proc tree but not entirely sure if this is the best way of analysing the data.

I’d appreciate to hear from anyone who has done something similar to this or has any ideas on how to develop this further.

Thanks

Zubair

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to bob2012

05-17-2013 04:29 AM

Hope you haven't seen my first version of the answer; I overlooked that you were already using PROC DISTANCE.

So I assume you have taken the Jaccard distance like in the divorce example for PROC DISTANCE which gives basically fractions of purchases of both prduchts within purchases of at least 1 of these products (1-fractions indeed). I would do the same. Maybe an alternative is to divide the number of combined purchases only by the number of alone purchases (resp. the reciprocal for distance), just to get another distance measure, and see what PROC CLUSTER makes different. Maybe you then get an idea to prefer one result to the other.

Hans

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to HWSteinberg

05-17-2013 07:44 PM

Hi Hans,

thanks for both answers! I'm a newbie to this type of analysis and need to understand the functions are the right ones to use. I have the table in a format where each row is a customer and columns represent products purchased, summing each column by quantity.

I'm sure the answer you provided makes sense but to be honest I don't understand it.

where can I find the divorce example please?

thanks

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to bob2012

05-22-2013 03:11 AM

Hi Bob

It's the SAS Online Doc, here http://support.sas.com/documentation/cdl/en/statug/65328/HTML/default/viewer.htm#statug_distance_exa.... But I just see, applying this to your problem, you should have a rather unusual data structure: Observations are products, and columns are purchases, 1 indicating purchase, 0 indicating no purchase. Of course instead of having purchases as columns, you could have buyers watched over a longer time, i.e. counting all purchases of one person in one column.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to HWSteinberg

06-27-2013 06:34 PM

Hi Hans,

THe observations are customers and columns are products. Each column shows how many of that product was bought. I then want to use those column in a cluster to show how much variance is accounted by that column. So, using cereals as an example, 50% of the variation is accounted for by Corn flakes, Frosties, and wheetabix cereals or 60% is explained by the same three cereals plus Cheerios etc..

THis them allows me to determine key products to then decide which I can remove or even suggest increasing variation.

thanks

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to bob2012

05-17-2013 11:01 AM

One way is to do a standard association analysis, e.g. proc assoc followed by proc rulegen. You could also do a sparse Singular Value Decomposition with this data, using Proc SPSVD, if you have Text Miner licensed.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to JamesCoxPhD

05-17-2013 07:22 PM

Hi James,

Thanks for sharing your ideas! I tried using associations (co-occurrences) which gave me useful insight, e.g. 10% of customers who bought single cream also bought double cream. At the same time I generated confidence levels, just as you would with proc assoc. I did this in base SAS and not miner. I do not have Text Miner.

Really pleased you mentioned proc assoc and rulegen as I used a variation of that but that didn't tell me which groups of products were being bought together, hence my preference to do a cluster.

Just to to clear, I changed my data set so I have one record per customer and all the different products as columns, measuring total quantity purchased over a period. Can I use that to cluster the products?

thanks!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to bob2012

05-17-2013 09:07 PM

google: market basket analysis and SAS.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

05-18-2013 07:26 AM

thanks Reeza! I think proc varclus could give me groupings of products purchased together? i have 1.6m customers and 450 products (columns), can proc varclus handle data that large, or do I use proc fastclus first then varclus please?

thanks

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to bob2012

05-18-2013 03:44 PM

I don't think it would based on how I understand cluster analysis. It will give you results, but I'm not sure your interpretation is correct.

What a proc can handle depends on your computer, if you're on a server it may be fine, if you're on a desktop you may not be.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

05-19-2013 03:05 AM

Hi

perhaps i should take a step back, what cluster method would I use then? What would you suggest Reeza?

thanks