Hi
Could you please advise how SAS Miner automatically chooses k in K-means clustering?.
The automatic method uses the following three-step process:
1. A large number of cluster seeds are chosen (50 by default) and placed in the input space. Cases in the training data are assigned to the closest seed, and an initial clustering of the data is completed. The means of the input variables in each of these preliminary clusters are substituted for the original training data cases in the second step of the process.
2. A hierarchical clustering algorithm (Ward’s method) is used to sequentially consolidate the clusters formed in the first step. At each step of the consolidation, a statistic called the cubic clustering criterion (CCC) is calculated. The first consolidation in which the CCC exceeds
3 provides the third step with the number of clusters to use. If no consolidation yields a CCC in excess of 3, the maximum number of clusters is selected. The number of clusters determined by the second step provides the value for k in a k-means clustering of the original training data cases.
The automatic method uses the following three-step process:
1. A large number of cluster seeds are chosen (50 by default) and placed in the input space. Cases in the training data are assigned to the closest seed, and an initial clustering of the data is completed. The means of the input variables in each of these preliminary clusters are substituted for the original training data cases in the second step of the process.
2. A hierarchical clustering algorithm (Ward’s method) is used to sequentially consolidate the clusters formed in the first step. At each step of the consolidation, a statistic called the cubic clustering criterion (CCC) is calculated. The first consolidation in which the CCC exceeds
3 provides the third step with the number of clusters to use. If no consolidation yields a CCC in excess of 3, the maximum number of clusters is selected. The number of clusters determined by the second step provides the value for k in a k-means clustering of the original training data cases.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.