BookmarkSubscribeRSS Feed
jaredp
Quartz | Level 8

In SAS Text Miner, the Text Cluster node will discover themes and assign each document to one of these themes.  Similarly, the Text Topic node will discover themes but assign each document to zero or more of those themes.

Do any of you have "rules of thumb" when preferring one over the other?

I've come to feel that the Text Cluster node is suited for documents that generally focus on a particular topic because when multiple concepts are present in a document, the chosen theme could be 'biased' (for lack of a better word).  Let me illustrate with an example.

In my customer surveys, respondents will sometimes respond as follows (fictional response):

"Your product could use some improvement.  Here are three suggestions: 1) the colours don't work together or match other products.  2) It's too expensive for the features provided.  3) It's much lager than your competitors."

Say the Text Cluster node determined three themes from the corpus: Improve colour, Improve pricing, and Improve size.  We know the Text cluster node will magically mathematically assign our example comment (document) to one of the above themes.  Picking one ignores the other two items written in the document.  The Text Topic node would likely assign the document to all three themes.

In practice, I actually still use both nodes.  In cases like the above fictional document above, I view the cluster node as 'pragmatic'.  That is, if forced to pick a theme, the most 'appropriate' is picked.

Any suggestions on when to use these nodes?

2 REPLIES 2
jaredp
Quartz | Level 8

These links are good.  It helps me understand topics and SVD much more.  I wonder if the fourth post will ever be written.  It looks like the author took a couple years off from blogging. 

What these links do not do is compare the Text Cluster and Text Topic nodes.

I also liked 2 other articles he linked to.  The first is about synonyms:

http://blogs.sas.com/content/text-mining/2008/12/11/when-are-synonyms-useful/

and the second is a real example of SVD:

http://www.nytimes.com/2008/11/23/magazine/23Netflix-t.html?_r=3&pagewanted=all&

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 7592 views
  • 1 like
  • 2 in conversation