Hello,
I have 3 questions:
1. how do I interpret the tree diagram ?
2. How can I specify the number of clusters I want
3. what is the code for K-means?
Thank you
1. how do I interpret the tree diagram ?
Tree only display the correlation(or distance) between nodes .
2. How can I specify the number of clusters I want
No. It is hard .You need to read more documentaion.
Or Using Component Analysis to help you decide how many clusters you need.
3. what is the code for K-means?
There are three distance definited in proc clus . k-means is one of them.
if i don't make a mistake, i remember k-means is the MEAN of each members of a cluster.
Ksharp
So there is no simple code to use for cluster analysis and specify the number of clusters I want?
PROC FASTCLUS and MODECLUS have a MAXCLUSTERS option that enables you to in some respect specify the number of clusters you want. PROC VARCLUS has a MIN and MAXCLUSTERS options as well. It depends what type of cluster analysis you intend to perform.
I have a set of data and am trying to find some sort of order, pattern in it and thought cluster analysis would be a good option. I did attempt the explanatory factor analysis which did not work. Could you please give me a sample code
thank you
The documention for every procedure comes with a number of useful examples:
Cluster Procedure:
FASTCLUS does allow setting the number of clusters. However, it will force the data to create exactly that many clusters, even if one cluster consists of one record.
The online help shows an example of using a varety of standarization methods followed by a call to FASTCLUS and print to see how well the clusters matched known categories. I found that very helpful.
I've used proc print with likely combinations of categorical variables (list option is your friend ) to id characteristics of the resulting clusters.
No. The code for Cluster is very simple . You can take suggestion from FriedEgg . Check the documentation, there are already lots of sample code you can reference .
The number of cluster is hard to decide , but you can specify it by yourself . 2 or 4 or 6 or anything else.
Component Analysis can help you understand the pattern of data which can help you decide which number of cluster is the best.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.