topic Cluster Analysis with SAS, When the data are mixed in Statistical Procedures

Cluster Analysis with SAS, When the data are mixed

deleted_user — Mon, 22 Jun 2009 03:08:54 GMT

Hi..

I have never done the cluster analysis with SAS before. I have read the websites and etc. The details are lengthy. Therefore, I am still confused about what are the general steps in performing cluster analysis with SAS. In some software, I could just load the raw data and then I got the results. Can anyone tell me so I got a rough idea of how to do it so that I could have a general idea about where/what topic I should be focusing???

I have a data set of about 200,000 observations with about 30-35 attributes. All of them is raw transaction data. Some attributes are categorical values (with many possible categories). Some are numeric. Some are 0 and 1. I am looking to find anomalous or suspicious transactions (outliers). Can anyone tell me the general steps that I should follow in performing cluster analysis??

Thank you so much in advance.

Best,
Panda

Re: Cluster Analysis with SAS, When the data are mixed

sfleming — Mon, 22 Jun 2009 18:30:21 GMT

You might start by reading the Introduction to Clustering Procedures:

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/introclus_index.htm

Re: Cluster Analysis with SAS, When the data are mixed

kdp — Mon, 29 Jun 2009 19:44:13 GMT

If you have Enterprise Guide 4.1, then it's really easy to get started on Clustering. Go to Analyze --> Multivariate --> Cluster analysis.

I have just started playing around with the cluster procedure, here are some things to keep in mind:
- I am not sure if the procedure handles character (you might want to convert the categorical values into nominal values)
- you might have to standardize the data (for example - if you have raw number of transactions everyday - try converting them to percentages of some sort)

Hope this helps!
kdp

GAP analysis?

khamil — Thu, 04 Mar 2010 16:33:27 GMT

We are using SAS for cluster analysis, and wonder if anyone has a protocol for GAP analysis to determine the optimal number of clusters? khamil