10-16-2012 12:09 AM
1. how do I interpret the tree diagram ?
Tree only display the correlation(or distance) between nodes .
2. How can I specify the number of clusters I want
No. It is hard .You need to read more documentaion.
Or Using Component Analysis to help you decide how many clusters you need.
3. what is the code for K-means?
There are three distance definited in proc clus . k-means is one of them.
if i don't make a mistake, i remember k-means is the MEAN of each members of a cluster.
10-16-2012 12:11 PM
PROC FASTCLUS and MODECLUS have a MAXCLUSTERS option that enables you to in some respect specify the number of clusters you want. PROC VARCLUS has a MIN and MAXCLUSTERS options as well. It depends what type of cluster analysis you intend to perform.
10-16-2012 12:17 PM
I have a set of data and am trying to find some sort of order, pattern in it and thought cluster analysis would be a good option. I did attempt the explanatory factor analysis which did not work. Could you please give me a sample code
10-16-2012 12:21 PM
The documention for every procedure comes with a number of useful examples:
10-16-2012 03:44 PM
FASTCLUS does allow setting the number of clusters. However, it will force the data to create exactly that many clusters, even if one cluster consists of one record.
The online help shows an example of using a varety of standarization methods followed by a call to FASTCLUS and print to see how well the clusters matched known categories. I found that very helpful.
I've used proc print with likely combinations of categorical variables (list option is your friend ) to id characteristics of the resulting clusters.
10-17-2012 12:22 AM
No. The code for Cluster is very simple . You can take suggestion from FriedEgg . Check the documentation, there are already lots of sample code you can reference .
The number of cluster is hard to decide , but you can specify it by yourself . 2 or 4 or 6 or anything else.
Component Analysis can help you understand the pattern of data which can help you decide which number of cluster is the best.