Text mining and content categorization

SAS Text Miner - When to use Text Cluster, Text Topic?

In SAS Text Miner, the Text Cluster node will discover themes and assign each document to one of these themes.  Similarly, the Text Topic node will discover themes but assign each document to zero or more of those themes.

Do any of you have "rules of thumb" when preferring one over the other?

I've come to feel that the Text Cluster node is suited for documents that generally focus on a particular topic because when multiple concepts are present in a document, the chosen theme could be 'biased' (for lack of a better word).  Let me illustrate with an example.

In my customer surveys, respondents will sometimes respond as follows (fictional response):

"Your product could use some improvement.  Here are three suggestions: 1) the colours don't work together or match other products.  2) It's too expensive for the features provided.  3) It's much lager than your competitors."

Say the Text Cluster node determined three themes from the corpus: Improve colour, Improve pricing, and Improve size.  We know the Text cluster node will magically mathematically assign our example comment (document) to one of the above themes.  Picking one ignores the other two items written in the document.  The Text Topic node would likely assign the document to all three themes.

In practice, I actually still use both nodes.  In cases like the above fictional document above, I view the cluster node as 'pragmatic'.  That is, if forced to pick a theme, the most 'appropriate' is picked.

Any suggestions on when to use these nodes?

These links are good.  It helps me understand topics and SVD much more.  I wonder if the fourth post will ever be written.  It looks like the author took a couple years off from blogging. 

What these links do not do is compare the Text Cluster and Text Topic nodes.

I also liked 2 other articles he linked to.  The first is about synonyms:


and the second is a real example of SVD:


