BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi..

I have never done the cluster analysis with SAS before. I have read the websites and etc. The details are lengthy. Therefore, I am still confused about what are the general steps in performing cluster analysis with SAS. In some software, I could just load the raw data and then I got the results. Can anyone tell me so I got a rough idea of how to do it so that I could have a general idea about where/what topic I should be focusing???

I have a data set of about 200,000 observations with about 30-35 attributes. All of them is raw transaction data. Some attributes are categorical values (with many possible categories). Some are numeric. Some are 0 and 1. I am looking to find anomalous or suspicious transactions (outliers). Can anyone tell me the general steps that I should follow in performing cluster analysis??

Thank you so much in advance.

Best,
Panda
3 REPLIES 3
sfleming
Calcite | Level 5
You might start by reading the Introduction to Clustering Procedures:

http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/introclus_index.htm
kdp
Calcite | Level 5 kdp
Calcite | Level 5
If you have Enterprise Guide 4.1, then it's really easy to get started on Clustering. Go to Analyze --> Multivariate --> Cluster analysis.

I have just started playing around with the cluster procedure, here are some things to keep in mind:
- I am not sure if the procedure handles character (you might want to convert the categorical values into nominal values)
- you might have to standardize the data (for example - if you have raw number of transactions everyday - try converting them to percentages of some sort)

Hope this helps!
kdp
khamil
Calcite | Level 5
We are using SAS for cluster analysis, and wonder if anyone has a protocol for GAP analysis to determine the optimal number of clusters? khamil

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1438 views
  • 0 likes
  • 4 in conversation