Hello, SAS Community! My questions are all about clustering in SAS Miner, version 12.1: - Upon running k-means clustering in SAS Miner, can we determine the optimal number of clusters up front? - When we set the Specification Method to Automatic and perform hierarchical clustering, how do we decide which clustering method (Average, Centroid, or Ward) produced the best outputs? Generally, which statistics I should be looking at and how do I interpret them for both types of clustering? Also: - Is there really "the best" number of clusters? Perhaps, there could be perfectly divided clusters from mathematical persepctive, say 7, (all observations are closest to each other and distances between clusters are the largest), but they do not provide any useful information with regards to the analysis objectives. Then, say, we start selecting user-specified number of cluster (4, 5, and so on). One of them really shows interesting results. Should such guessing be disregarded completely? - How are the number of clusters and clusters themselves going to change when we normalize the data (Transform node => Formulas => Using log transformation for, say, yearly revenue in thousands)? - When is it necessary to remove outliers from our input variables? I will be very grateful for all answers! If someone could share a title of a thorough textbook on cluster analysis in SAS Miner, that would be of great help, too!
... View more