10-11-2011 03:44 AM

Hi,

How can we determine the number of Optimal cluster in cluster analysis?

Thanks,

Nikhil

07-07-2017
01:16 PM

10-11-2011 10:05 AM

I think there are no strict rules for optimal number of clusters and as in all cluster analysis – there is a lot of room for variations and interpretation.

Maybe someone can give more specific criteria, but the ones I would consider:

* Use of graphical analysis to understand if your clusters are well separated, maybe some are very close and can be joined. I think also a tree (PROC TREE) is a very useful tool. There you can see how many groups (more separated tree branches) you have.

* Most likely you wouldn’t like to have clusters with just 1 or few observations.

* In some cases your data or task can give hint about number of clusters (e.g. maybe you want to separate items with high, low and middle level of something).

07-07-2017
01:16 PM

10-11-2011 10:05 AM

01-18-2012 07:29 AM

For hierarchical clustering try the Sarle's Cubic Clustering Criterion in PROC CLUSTER:

plot _CCC_ versus the number of clusters and look for peaks where _ccc_ > 3 or look for **local peaks of pseudo-F **statistic (_PSF_) **combined with a small value of the pseudo-t^2** statistic (_PST2_) and a **larger pseudo t^2 for the next cluster **fusion

For K-Means clustering use this approach on a sample of your data to determine the max limit for k and assign it to the maxc= option in PROC FASTCLUS on the complete data.