I have never done the cluster analysis with SAS before. I have read the websites and etc. The details are lengthy. Therefore, I am still confused about what are the general steps in performing cluster analysis with SAS. In some software, I could just load the raw data and then I got the results. Can anyone tell me so I got a rough idea of how to do it so that I could have a general idea about where/what topic I should be focusing???
I have a data set of about 200,000 observations with about 30-35 attributes. All of them is raw transaction data. Some attributes are categorical values (with many possible categories). Some are numeric. Some are 0 and 1. I am looking to find anomalous or suspicious transactions (outliers). Can anyone tell me the general steps that I should follow in performing cluster analysis??
If you have Enterprise Guide 4.1, then it's really easy to get started on Clustering. Go to Analyze --> Multivariate --> Cluster analysis.
I have just started playing around with the cluster procedure, here are some things to keep in mind:
- I am not sure if the procedure handles character (you might want to convert the categorical values into nominal values)
- you might have to standardize the data (for example - if you have raw number of transactions everyday - try converting them to percentages of some sort)