It sounds like you are using the cluster node to find clusters, rather than the Text Topic node?
For the Text Cluster node, the svd dimensions are used to represent each document in k-dimensional space, then the clustering algorithms are used to cluster the documents represented in that space. So the number of SVD dimensions does not correspond to the number of clusters. Instead, you either choose a number with the exact method, or we use PROC CLUSTER and Wards method to attempt to determine how many clusters there might be up to the maximum number setting. See the docs on Proc Cluster.
If you are using the Text Topic node, then there you only specify a number of clusters. We have found that the scree plot may be helpful on small textbook example problems, but for large data mining problems, it is not usually helpful in determining the number of topics. If you’re still curious, the Text Topic node doesn’t output the singular values, but if you go back to the Text Cluster node they are output. You can run the Text Cluster node setting the number of SVD dimensions to the desired value. Then look for the Textcluster_svd_s data set in your workspace library. That table of singular values is essentially what would have been output in the Text Topic node. You can plot them or scan them to see if they are helpful to you for picking the number of topics. Once you have the value, go back to the Text Topic node and choose it and rerun it.
So there are a couple of things, but as you are aware the number of clusters or number of topics contained in a collection can be a very subjective thing.
By the way, there is a Text Analytics community on this website so feel free to participate there in the future.
Russ
... View more