turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Exogenously imposing the number of clusters in Pro...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-29-2012 06:23 PM

Hi everybody,

I am trying to cluster raw return data into 10 clusters. Is it possible in Proc cluster to exogenously identify the number of clusters?

Thank you in advance for your help.

Ozzy

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ozzy

04-29-2012 08:19 PM

Not sure what you mean by *exogenously identify the number of clusters*. Proc cluster builds a binary tree, starting with every observation in its own cluster and at every step joining two cluster togetter until there is only one cluster. You can pick the level of clustering that you want by trimming that tree at the appropriate level. The trimming can be done with Proc tree. For example :

proc cluster data=test outtree=tree method=centroid;

var x y z;

id id;

run;

proc tree data=tree out=clusters nclusters=10;

run;

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

04-29-2012 08:23 PM

This is exactly what I needed; thank you very much.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to ozzy

01-27-2015 09:07 AM

Hi,

I have a question here how did you land upon the number "10" even before running proc cluster, i.e. how did you decided to create 10 clusters. What was the motivation behind that, was a business requirement? Because otherwise pre-deciding the number of clusters in impossible and scientifically incorrect.

Now, I am in a situation where I have to use Hierarchical Cluster analysis but I am not being able to decide the number of clusters. I see Proc ACECLUS which says

* "Neither cluster membership nor the number of clusters needs to be known. PROC ACECLUS is useful for preprocessing data to be subsequently clustered by the CLUSTER or FASTCLUS procedure*"

But when I see the example provided (LONE example) in documentation section it uses "MAXC=3" option (which is offcourse mandatory requirement of FASTCLUS procedure and is like providing number of cluster explicitly - SAS/STAT(R) 9.2 User's Guide, Second Edition) if it is to be that way then what is the use of running ACECLUS when we are giving the number of clusters explicitly and why then it is quoted in above sentence number of cluster need not to be known. I am confused.

Nevertheless main question is can we use FASTCLUS or CLUSTER procedure without Prior running ACECLUS (I think the answer is yes). But ACECLUS has got its own importance for calculating canonical variables if our dataset that have variables with different scalar measures. And if we use ACECLUS first, then how to arrive at desired number of clusters given that user is novice and is not aware of different algorithms and methods and business needs etc etc.

Thanks.

Harshad M.