turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Optimal number of clusters

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-11-2011 03:44 AM

Hi,

How can we determine the number of Optimal cluster in cluster analysis?

Thanks,

Nikhil

Accepted Solutions

Solution

07-07-2017
01:16 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-11-2011 10:05 AM

I think there are no strict rules for optimal number of clusters and as in all cluster analysis – there is a lot of room for variations and interpretation.

Maybe someone can give more specific criteria, but the ones I would consider:

* Use of graphical analysis to understand if your clusters are well separated, maybe some are very close and can be joined. I think also a tree (PROC TREE) is a very useful tool. There you can see how many groups (more separated tree branches) you have.

* Most likely you wouldn’t like to have clusters with just 1 or few observations.

* In some cases your data or task can give hint about number of clusters (e.g. maybe you want to separate items with high, low and middle level of something).

All Replies

Solution

07-07-2017
01:16 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-11-2011 10:05 AM

I think there are no strict rules for optimal number of clusters and as in all cluster analysis – there is a lot of room for variations and interpretation.

Maybe someone can give more specific criteria, but the ones I would consider:

* Use of graphical analysis to understand if your clusters are well separated, maybe some are very close and can be joined. I think also a tree (PROC TREE) is a very useful tool. There you can see how many groups (more separated tree branches) you have.

* Most likely you wouldn’t like to have clusters with just 1 or few observations.

* In some cases your data or task can give hint about number of clusters (e.g. maybe you want to separate items with high, low and middle level of something).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-18-2012 07:29 AM

For hierarchical clustering try the Sarle's Cubic Clustering Criterion in PROC CLUSTER:

plot _CCC_ versus the number of clusters and look for peaks where _ccc_ > 3 or look for **local peaks of pseudo-F **statistic (_PSF_) **combined with a small value of the pseudo-t^2** statistic (_PST2_) and a **larger pseudo t^2 for the next cluster **fusion

For K-Means clustering use this approach on a sample of your data to determine the max limit for k and assign it to the maxc= option in PROC FASTCLUS on the complete data.