JinHong,
If you want complete coverage, every document to belong to a topic, you could look at clustering rather than topics. You do have some control of topics with some macro variables that you can set in your startup code. Take a look at these two found in the Text Miner doc under "Macro Variables, Macros, and Functions"
TMM_DOCCUTOFF
0.001
document cutoff value is for any user-created topic. It is used to determine the default document cutoff for user topics (excluding those that are modified multi-term or single-term topics) in the Topic table. Higher values decrease the number of documents assigned to a topic.
TMM_TERM_CUTOFF
cutoff value is for any user-created or multi-term topic. It is used to determine the default term cutoff for user topics (excluding those that are modified multi-term or single-term topics) and for multi-term in the Topic table. Higher values decrease the number of documents assigned to a topic. If this macro variable is set to blank or not set, then the mean topic weight + 1 standard deviation is set for topic cutoff for each topic.
As far as the optimal number of clusters, SAS Text Miner uses a heuristic based on your max number of dimensions and taking a certain percentage explained from that. Ideally we would like to take the percentage from the complete SVD, not the truncated one, but that is computationally not feasible with large text. I always treat this value as one to be tuned, typically along with the entries on my stop list. I experiment with changing the number of topics from 5-25 or so and when i find one that seems useful. I will also look at the descriptive terms for topics and add terms to the stop list that seem non informative given the context. Repeat until you get some useful insights.
... View more