BookmarkSubscribeRSS Feed
kenken063
Calcite | Level 5

Hi,

I am currently using the SAS Text Miner 12.1 to do some text mining project. One of the task is to let the machine to automatically categorize each of the customer comments into several categories such as pricing issue, technical problem, contact issue. Now the task is handled by using the text rule builders node in the SAS Text Miner. It seems that I cannot manually change the rule that is built by that node and the accuracy is kind of low. Then I found the SAS Content Categorization solution. What's the major differences between the two solution? Like algorithm and primary usage?

Thanks,

Ken

6 REPLIES 6
JuliaM
Calcite | Level 5

Hi Ken,

    We went with the SAS Content Categorization as we already had people and a well-defined manual process that we wanted to mimic. The rules-based SAS Content Cat Studio allowed us to build rules that would mimic human judgment. The rules are based on Boolean logic - AND, OR, NOT plus several other choices of operators that limit the categorization. It takes a long time to build the rule profiles but you have a lot of control over the categorization. There might be a way to do this with SAS Text Miner, but I'm not that familiar with the product. The Text Miner is based on statistical algorithms, so in order to change the output, you would have to change the algorithm in some way.

kenken063
Calcite | Level 5

Thank, Julia. Now I can see that SAS Content Categorization depends more on users' domain knowledge in building categorization rules while SAS Text minor depends on the statistical approach (SVD). May I know, in your case, what's the accuracy rate and recall rate can you usually get in SAS CC?

FionaMcNeill
SAS Employee

Hi Ken and Julia -

Text mining is a discovery technology, working across a collection of documents and applying machine learning, NLP, statistics, etc.. to identify - in this case, predictive rules.  And while you can output those rules in a format for Content Categorization (as a discovered starter rule set - using the Text Miner left hand navigation option), refinement and additions are currently built from the Categorization technology (just as Julia describes). Categorization is focused on advanced linguisitc rule definition and management. And the Categorization Server is currently used to apply those rules to incoming, individual documents.

kenken063
Calcite | Level 5

Thanks, frm. I guess my next question is what kind of different data should be processed by the SAS Text Minor and SAS Content Categorization?

FionaMcNeill
SAS Employee

Hi Ken -

The data doesn't need to be different between SAS Text Miner and SAS Content Categorization - both analyze electronic free format text, from a variety of sources and formats.  The difference is that Text Miner examines a collection, analyzing it to find topics, clusters, associations, themes and rules buried amongst the documents.  SAS Content Categorization on the other hand, examines each document at a time, applying linguistic rules and taxonomies to classify, extract terms, phrases, facts, etc. Both methods score documents.

jaredp
Quartz | Level 8

The book "Text Mining and Analysis: Practical Methods, Examples, and Case Studies ... By Goutam Chakraborty, Murali Pagolu, Satish Garla" explains your question quite well.

http://books.google.ca/books?id=SUKDAgAAQBAJ&pg=PT218&lpg=PT218&dq=text+rule+builder+node&source=bl&...

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2556 views
  • 2 likes
  • 4 in conversation