turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- how to determine the number of clusters in K-means...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2016 04:50 PM

for K-means cluster analysis, one can use proc fastclus like

`proc fastclus data=mydata out=out maxc=4 maxiter=20;`

and change the number defined by maxc=, and run a number of times, then compare the Pseduo F and CCC values, to see which number of clusters gives peaks

or one can use proc cluster:

`PROC CLUSTER data=mydata METHOD=WARD out=out ccc pseudo print=15;`

to find the number of clusters with pesudo F, t2 and ccc.

and also look at junp in Semipartial R-Square.

sometimes these indications do not agree to each other. which indicator is more reliable? Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to fengyuwuzu

01-08-2016 05:42 PM

Hello,

If you are doubting between 2 k-values, you can use Beale's F-type statistic to determine the final number of clusters. It will tell you whether the larger solution is significantly better or not (in the latter case the solution with fewer clusters is preferable).

This technique is discussed in the "Applied Clustering Techniques" course notes.

You can also try something relatively new.

Tip: K-means clustering in SAS - comparing PROC FASTCLUS and PROC HPCLUS

For numeric variables, PROC HPCLUS provides the convenient NOC=ABC option to auto-select the number of clusters k based on the aligned box criterion (ABC). For each k value from MINCLUSTERS (default to 2) to MAXCLUSTERS, ABC compares the within-cluster dispersion of the results to that of a simulated reference distribution, and selects a value of k where the within-cluster dispersions of the data results and the reference distribution differ greatly.

See also:

Paper SAS313-2014

An Overview of Machine Learning with SAS® Enterprise Miner™

Patrick Hall, Jared Dean, Ilknur Kaynar Kabul, Jorge Silva

SAS Institute Inc.

__https://support.sas.com/resources/papers/proceedings14/SAS313-2014.pdf__

Find HPCLUS and ABC (keywords).

PROC HPCLUS is one of many High-Performance Procedures in SAS Enterprise MIner 13.2 and beyond.

Cheers,

Koen

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to sbxkoenk

01-11-2016 11:39 AM

Thank you! This reply is very informative! I will look into the Proc HPCLUS which sounds very interesting.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to sbxkoenk

08-24-2017 03:10 PM

Hi,

Could you please guide how can we determine the optimal number of clusters for k-modes clustering in HPCLUS. I know that HPCLUS can determine the best value of K for numerical variables(noc=ABC) but is there any way to determine the best k for categorical variables using HPCLUS?

Could you please guide how can we determine the optimal number of clusters for k-modes clustering in HPCLUS. I know that HPCLUS can determine the best value of K for numerical variables(noc=ABC) but is there any way to determine the best k for categorical variables using HPCLUS?