04-19-2016 05:08 PM
I have a data structure as follow: Customer bought some items.
How can I write SAS Procedure (PROC CLUSTER? PROC FASTCLU?) to cluster Customers into distinct groups, say Group1, Group2 based on their ITEM_IDs bought?
I am using Base SAS 9.4 or SAS EG 6.1.
CSV data is attached.
04-19-2016 05:38 PM
Based on your problem description I think this may be Market Basket Analysis rather than cluster analysis.
MBA is implemented in SAS EM but not Base. If you only have Base there's a macro written that will perform it. You can search for it on lexjansen.com
If you are doing cluster analysis make sure to treat the variables as categorical sonce item 18 and item 17 are not a distance of 1 apart and that distance doesn't have any meaning.
04-19-2016 06:10 PM
Thanks for responding.
I have SAS EM too and tried to run MBA on my data.
I got some output but not sure how to use them.
In my data, I have about 100 customers with multiple purchases (identified by ITEM_IDs).
I am trying to group these 100 customers into 4-5 clusters based on their purchased ITEM_IDs.
Should I create a DISTANCE matrix of Customers based on their purchased ITEM_IDs?
Thanks for more insights and inputs.
04-19-2016 09:08 PM
1) If there are only character variables out there, You can firstly use proc distance to get the distance matrix, and feed it into proc cluster, Search ( character variable cluster ) at support.sas.com , you will get the code.
2) If there are mixed up character and numeric variable, two way I can thing is one is using Decision Tree (proc hpsplit), another
way is general logistic regression (proc logistic or other proc can run logistic regression).