BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
japete
Calcite | Level 5

Hi

 

Could you please advise how SAS Miner automatically chooses k in K-means clustering?.

1 ACCEPTED SOLUTION

Accepted Solutions
sdhilip
Quartz | Level 8

 The automatic method uses the following three-step process:

 

1. A large number of cluster seeds are chosen (50 by default) and placed in the input space. Cases in the training data are assigned to the closest seed, and an initial clustering of the data is completed. The means of the input variables in each of these preliminary clusters are substituted for the original training data cases in the second step of the process.

 

2. A hierarchical clustering algorithm (Ward’s method) is used to sequentially consolidate the clusters formed in the first step. At each step of the consolidation, a statistic called the cubic clustering criterion (CCC) is calculated. The first consolidation in which the CCC exceeds

 

3 provides the third step with the number of clusters to use. If no consolidation yields a CCC in excess of 3, the maximum number of clusters is selected. The number of clusters determined by the second step provides the value for k in a k-means clustering of the original training data cases.

View solution in original post

1 REPLY 1
sdhilip
Quartz | Level 8

 The automatic method uses the following three-step process:

 

1. A large number of cluster seeds are chosen (50 by default) and placed in the input space. Cases in the training data are assigned to the closest seed, and an initial clustering of the data is completed. The means of the input variables in each of these preliminary clusters are substituted for the original training data cases in the second step of the process.

 

2. A hierarchical clustering algorithm (Ward’s method) is used to sequentially consolidate the clusters formed in the first step. At each step of the consolidation, a statistic called the cubic clustering criterion (CCC) is calculated. The first consolidation in which the CCC exceeds

 

3 provides the third step with the number of clusters to use. If no consolidation yields a CCC in excess of 3, the maximum number of clusters is selected. The number of clusters determined by the second step provides the value for k in a k-means clustering of the original training data cases.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1052 views
  • 0 likes
  • 2 in conversation