06-12-2013 01:46 PM
I am currently using the SAS Text Miner 12.1 to do some text mining project. One of the task is to let the machine to automatically categorize each of the customer comments into several categories such as pricing issue, technical problem, contact issue. Now the task is handled by using the text rule builders node in the SAS Text Miner. It seems that I cannot manually change the rule that is built by that node and the accuracy is kind of low. Then I found the SAS Content Categorization solution. What's the major differences between the two solution? Like algorithm and primary usage?
06-19-2013 09:08 PM
We went with the SAS Content Categorization as we already had people and a well-defined manual process that we wanted to mimic. The rules-based SAS Content Cat Studio allowed us to build rules that would mimic human judgment. The rules are based on Boolean logic - AND, OR, NOT plus several other choices of operators that limit the categorization. It takes a long time to build the rule profiles but you have a lot of control over the categorization. There might be a way to do this with SAS Text Miner, but I'm not that familiar with the product. The Text Miner is based on statistical algorithms, so in order to change the output, you would have to change the algorithm in some way.
06-20-2013 12:30 PM
Thank, Julia. Now I can see that SAS Content Categorization depends more on users' domain knowledge in building categorization rules while SAS Text minor depends on the statistical approach (SVD). May I know, in your case, what's the accuracy rate and recall rate can you usually get in SAS CC?
06-20-2013 09:47 AM
Hi Ken and Julia -
Text mining is a discovery technology, working across a collection of documents and applying machine learning, NLP, statistics, etc.. to identify - in this case, predictive rules. And while you can output those rules in a format for Content Categorization (as a discovered starter rule set - using the Text Miner left hand navigation option), refinement and additions are currently built from the Categorization technology (just as Julia describes). Categorization is focused on advanced linguisitc rule definition and management. And the Categorization Server is currently used to apply those rules to incoming, individual documents.
06-20-2013 12:37 PM
Thanks, frm. I guess my next question is what kind of different data should be processed by the SAS Text Minor and SAS Content Categorization?
06-20-2013 01:09 PM
Hi Ken -
The data doesn't need to be different between SAS Text Miner and SAS Content Categorization - both analyze electronic free format text, from a variety of sources and formats. The difference is that Text Miner examines a collection, analyzing it to find topics, clusters, associations, themes and rules buried amongst the documents. SAS Content Categorization on the other hand, examines each document at a time, applying linguistic rules and taxonomies to classify, extract terms, phrases, facts, etc. Both methods score documents.
04-17-2014 04:30 PM
The book "Text Mining and Analysis: Practical Methods, Examples, and Case Studies ... By Goutam Chakraborty, Murali Pagolu, Satish Garla" explains your question quite well.