Help using Base SAS procedures

Cluster Analysis

Reply
New Contributor
Posts: 4

Cluster Analysis

I have a data structure as follow: Customer bought some items.

Customer_ID, ITEM_ID

Adams, 18
Adams, 29
Adams, 30
Allen, 9
Allen, 27
Anderson,24
Anderson,26
Bailey, 7
Bailey, 30
Baker, 7
Baker, 10
Baker, 19
Baker, 31
Barnes, 10
Barnes, 21
Barnes, 22
Barnes, 31
...
etc

How can I write SAS Procedure (PROC CLUSTER? PROC FASTCLU?) to cluster Customers into distinct groups, say Group1, Group2 based on their ITEM_IDs bought?

 

I am using Base SAS 9.4 or SAS EG 6.1.

CSV data is attached.

 

THANKS.

 

Super User
Posts: 17,837

Re: Cluster Analysis

Based on your problem description I think this may be Market Basket Analysis rather than cluster analysis. 

 

MBA is implemented in SAS EM but not Base. If you only have Base there's a macro written that will perform it. You can search for it on lexjansen.com 

 

If you are doing cluster analysis make sure to treat the variables as categorical sonce item 18 and item 17 are not a distance of 1 apart and that distance doesn't have any meaning. 

New Contributor
Posts: 4

Re: Cluster Analysis

Thanks for responding.

I have SAS EM too and tried to run MBA on my data.

I got some output but not sure how to use them.

 

In my data, I have about 100 customers with multiple purchases (identified by ITEM_IDs).

I am trying to group these 100 customers into 4-5 clusters based on their purchased ITEM_IDs.

 

Should I create a DISTANCE matrix of Customers based on their purchased ITEM_IDs?

 

Thanks for more insights and inputs.

Super User
Posts: 9,681

Re: Cluster Analysis

1) If there are only character variables out there, You can firstly use proc distance to get the distance matrix, and feed it into proc cluster, Search ( character variable cluster ) at support.sas.com , you will get the code.

2) If there are mixed up character and numeric variable, two way I can thing is one is using Decision Tree (proc hpsplit), another

way is general logistic regression (proc logistic or other proc can run logistic regression).

Ask a Question
Discussion stats
  • 3 replies
  • 285 views
  • 0 likes
  • 3 in conversation