Help using Base SAS procedures

cluster analysis

Reply
Regular Contributor
Posts: 161

cluster analysis

Hello,

I have 3 questions:

1. how do I interpret the tree diagram ?

2. How can I specify the number of clusters I want

3. what is the code for K-means?

Thank you

Super User
Posts: 9,676

Re: cluster analysis

1. how do I interpret the tree diagram ?

Tree only display the correlation(or distance) between nodes .

2. How can I specify the number of clusters I want

No. It is hard .You need to read more documentaion.

Or Using Component Analysis to help you decide how many clusters  you need.

3. what is the code for K-means?

There are three distance definited in proc clus . k-means is one of them.

if i don't make a mistake, i remember k-means is the MEAN of each members of a cluster.

Ksharp

Regular Contributor
Posts: 161

Re: cluster analysis

So there is no simple code to use for cluster analysis and specify the number of clusters I want?

Trusted Advisor
Posts: 1,300

Re: cluster analysis

PROC FASTCLUS and MODECLUS have a MAXCLUSTERS option that enables you to in some respect specify the number of clusters you want.  PROC VARCLUS has a MIN and MAXCLUSTERS options as well.  It depends what type of cluster analysis you intend to perform.

Regular Contributor
Posts: 161

Re: cluster analysis

I have a set of data and am trying to find some sort of order, pattern in it and thought cluster analysis would be a good option. I did attempt the explanatory factor analysis which did not work. Could you please give me a sample code

thank you

Trusted Advisor
Posts: 1,300

Re: cluster analysis

The documention for every procedure comes with a number of useful examples:

Cluster Procedure:

http://support.sas.com/documentation/cdl/en/statug/65328/HTML/default/viewer.htm#statug_cluster_exam...

Super User
Posts: 10,492

Re: cluster analysis

FASTCLUS does allow setting the number of clusters. However, it will force the data to create exactly that many clusters, even if one cluster consists of one record.

The online help shows an example of using a varety of standarization methods followed by a call to FASTCLUS and print to see how well the clusters matched known categories. I found that very helpful.

I've used proc print with likely combinations of categorical variables (list option is your friend ) to id characteristics of the resulting clusters.

Super User
Posts: 9,676

Re: cluster analysis

No. The code for Cluster is very simple . You can take suggestion from FriedEgg . Check the documentation, there are already lots of sample code you can reference .

The number of cluster is hard to decide , but you can specify it by yourself . 2 or 4 or 6 or anything else.

Component Analysis can help you understand the pattern of data which can help you decide which number of cluster is the best.

Ask a Question
Discussion stats
  • 7 replies
  • 342 views
  • 0 likes
  • 4 in conversation