BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi..

I have never done the cluster analysis with SAS before. I have read the websites and etc. The details are lengthy. Therefore, I am still confused about what are the general steps in performing cluster analysis with SAS. In some software, I could just load the raw data and then I got the results. Can anyone tell me so I got a rough idea of how to do it so that I could have a general idea about where/what topic I should be focusing???

I have a data set of about 200,000 observations with about 30-35 attributes. All of them is raw transaction data. Some attributes are categorical values (with many possible categories). Some are numeric. Some are 0 and 1. I am looking to find anomalous or suspicious transactions (outliers). Can anyone tell me the general steps that I should follow in performing cluster analysis??

Thank you so much in advance.

Best,
Panda
3 REPLIES 3
sfleming
Calcite | Level 5
You might start by reading the Introduction to Clustering Procedures:

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/introclus_index.htm
kdp
Calcite | Level 5 kdp
Calcite | Level 5
If you have Enterprise Guide 4.1, then it's really easy to get started on Clustering. Go to Analyze --> Multivariate --> Cluster analysis.

I have just started playing around with the cluster procedure, here are some things to keep in mind:
- I am not sure if the procedure handles character (you might want to convert the categorical values into nominal values)
- you might have to standardize the data (for example - if you have raw number of transactions everyday - try converting them to percentages of some sort)

Hope this helps!
kdp
khamil
Calcite | Level 5
We are using SAS for cluster analysis, and wonder if anyone has a protocol for GAP analysis to determine the optimal number of clusters? khamil

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1288 views
  • 0 likes
  • 4 in conversation