Hello,
When doing k-means clustering, one of the difficult questions is :
----> What to choose as the value of k (number of clusters)?
That question is answered in the Enterprise Miner clustering node by using an intermediate hierarchical clustering step.
For that (intermediate) hierarchical clustering step, the methods WARD and CENTROID are relevant.
[ ... Ward´s linkage is thus a method for hierarchical cluster analysis (nothing to do with k-means!!).
The idea has much in common with analysis of variance (ANOVA). The WARD linkage function specifying the distance between two clusters is computed as the increase in the "error sum of squares" (ESS) after fusing two clusters into a single cluster. ]
Once the number of clusters is determined, WARD and CENTROID are no longer relevant.
Because once k is set equal to 11 for example, an 11-means clustering is done using the k-means algorithm (with k=11).
WARD is generally believed to be better than CENTROID, so go for WARD !
Again, the cluster node works with k-means for the (preliminary and) final clustering and the WARD and CENTROID can only determine the number of clusters.
Koen
Hello,
If I remember well, this is how the cluster node in Enterprise Miner works :
Procedures used are PROC FASTCLUS and PROC CLUSTER.
SAS® Enterprise Miner™ 15.1: Reference Help
Cluster Node
https://go.documentation.sas.com/doc/en/emref/15.1/p042mbykzcvpoln1m14cycem6m4a.htm
I think there are also High-Performance nodes in Enterprise Miner 15.1 and 15.2.
The High-Performance nodes also have clustering.
Using the High-Performance clustering node, PROC HPCLUS is used.
That is k-means clustering only.
To estimate the number of clusters (NOC), NOC=ABC is specified in the PROC HPCLUS statement.
This option uses the aligned box criterion (ABC) method to find the "best" n° of clusters.
BR,
Koen
In Enterprise Miner, there is selection criteria, what is the differences between Ward and Centroid? Are they both using K-means algorithm? Centroid seems like K-means because K-means is based on calculating distance between centroid and other data points.
Hello,
That property is for the PROC CLUSTER (agglomerative hierarchical clustering) part of the algorithm!
See here :
SAS/STAT® 15.2 User's Guide
The CLUSTER Procedure
Clustering Methods
https://documentation.sas.com/doc/en/statug/15.2/statug_cluster_details01.htm
For k-means you do not have that choice (distances in k-means are always distances to the centroid).
But k-means starts with k-clusters and ends with k clusters (the way of constituting the clusters is completely different than it is for hierarchical clustering).
Koen
Okay. Are you suggesting that if I drag a cluster node into the diagram, it does not matter if I choose Ward or Centroid in the property panel on the left? Because I am able to choose Ward or Centroid if I select the cluster node (I don't think it is hierarchical clustering node). Are you suggesting these two methods will give the same results?
Hello,
Ward and centroid method will probably not give the same end-result. Unless the derived number of clusters is the same when using both methods.
remember my first reply :
Good luck with your analyses !
Koen
Hi, I am not familiar with the SAS code. Thus, I don't know what the difference between PROC FASTCLUS and PROC CLUSTER is.
I use SAS EM. And I drag a Cluster node under the Explore tab to the diagram and connect the Cluster node to my data node. Then if I select the cluster node, in the property panel on the left, there is Ward, Centroid and other options under selecting criteria. My question is if I would like use K means, shall I pick Centroid as the selecting criteria? Because I don't think Ward is related to K means algorithm. Or do they both apply to K means algorithm?
Sorry, my question was moved from new users forum to here. I am not sure if I can get help here.
Hello,
When doing k-means clustering, one of the difficult questions is :
----> What to choose as the value of k (number of clusters)?
That question is answered in the Enterprise Miner clustering node by using an intermediate hierarchical clustering step.
For that (intermediate) hierarchical clustering step, the methods WARD and CENTROID are relevant.
[ ... Ward´s linkage is thus a method for hierarchical cluster analysis (nothing to do with k-means!!).
The idea has much in common with analysis of variance (ANOVA). The WARD linkage function specifying the distance between two clusters is computed as the increase in the "error sum of squares" (ESS) after fusing two clusters into a single cluster. ]
Once the number of clusters is determined, WARD and CENTROID are no longer relevant.
Because once k is set equal to 11 for example, an 11-means clustering is done using the k-means algorithm (with k=11).
WARD is generally believed to be better than CENTROID, so go for WARD !
Again, the cluster node works with k-means for the (preliminary and) final clustering and the WARD and CENTROID can only determine the number of clusters.
Koen
I see. This is super helpful. One more question, if I select centroid, how is optimal K selected?
Hello @ycenycute ,
How is "optimal" k selected?
Suppose you have 100 000 observations in a 20-dimensional input space.
First there's a k-means to cluster the 100 000 observations into 50 disjoint clusters.
The 50 mean vectors (multivariate means) of these 50 disjoint clusters are then hierarchically clustered. From 50 to 1.
The distance between clusters is calculated using the centroid method and the two clusters that are closest together (using centroid linking) are merged in such an agglomerative hierarchical clustering step. You start with 50 single-element clusters and you end up with 1.
Then using the CCC (Cubic Clustering Criterion) the "best" k is selected, because with k clusters it is believed the solution is "optimal" (i.e. the most heterogeneity among the clusters and the most homogeneity within the clusters).
Suppose k is selected to be 8.
Then a new k-means clustering on the full 100 000 observations is done with k = 8 (to make 8 disjoint clusters).
In data mining the data sets are mostly too big to do only hierarchical clustering.
Doing hierarchical clustering on 100 000 observations may take a full day and lots of resources.
That is because you start with 100 000 single element clusters and in each step you merge two clusters (to eventually reach one cluster containing all observations).
Koen
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.