It seems that you actually have two questions here: 1) How do I compare two clustering results to determine which is optimal 2) How do I determine the number of clusters is optimal.
While 1) can be related to 2), if you want to compare a clustering result with 3 clusters vs a result with 20 clusters, I will mostly address these separately. I will have some specific details in my answer, but also more general points. I hope both help!
1) "How do I compare two clustering results to determine which is optimal"
As mentioned, by Ksharp, the Analysis of Variance is a useful metric to use when considering clustering metrics. You can use PROC GLM in an Enterprise Miner code node to do this.
Ultimately though, as clustering is an unsupervised task (ie there is no target variable used), I find that the meaning of "optimal" in the case of clustering can be problem dependent (even if the data is the same).
The way I like to approach the question is by first asking "what is the goal of clustering" for the context of the problem you're working on (what do you want the clusters to help you do?). For example, if it's a predictive modeling problem in which you want to develop models on each cluster separately, then the overall accuracy of your models across all the data will let you know how good the clustering is.
2) "How do I determine the number of clusters when using clustering"
One way, if you have SAS Enterprise Miner 13.1 or later, is the HP Cluster node under the HPDM tab. This node has a metric called the "Aligned Box Criterion" which automatically seeks to find the number of clusters for you.
Another method is called spectral clustering, which is looks at the eigenvalues of a similarity matrix to try to determine the number of clusters. While this is not implemented in Enterprise Miner, SAS does have the procedures so that you could implement it yourself using a SAS Code node, with data step and a procedure to get the principal components, followed by kmeans.
----
Finally, an idea to address both questions that is much more involved, is consensus clustering (which can be used with the two previous ideas for determining the number of clusters). The goal behind consensus clustering is to ensemble multiple clustering results into one (including results with different numbers of clusters). The reasoning for why you would want to ensemble is that if multiple clustering results overlap, then you feel confident that the areas of overlap are "correct" / "optimal." Again, this is not implemented in Enterprise Miner, and is quite involved. That being said, it is possible to do using SAS data step code and the procedures / nodes in Enterprise Miner.
Hopefully some of this helps, either immediately, or by giving you things to think about.
... View more