Seven tricky sentences for NLP and text mining algorithms - AnalyticBridge
Posted by Mirko Krivanek on Text Mining - AnalyticBridge
I thought that these were very interesting.
I would add to the above 7, words that are often used interchangeably, but are intended to mean two different things. For example in the Development Experience Clearinghouse, evaluations are intended to be used to describe documents that analyze either the performance of a project or the impact a project has made on a sector or geographic location. Assessments are supposed to be documents that analyze the conditions of a particular sector or geographical location before a project or program takes place. And yet, the terms are often used indiscriminately within the documents themselves. A human can look at the document and discern if it is an assessment or an evaluation, but it's very difficult to write rules for the SAS Content Categorization Studio to parse the differences.
What linguistic challenges do others have when writing profile rules or texting mining algorithms?
That's very interesting. Cases like these are why a good training corpus is necessary.
It's funny how #3 and #6 seem fixable with a slight word change. "I ate tomato with salt" and "The lamb was cooked and ready to eat". The others are not as easily fixed.
If the classification rules are difficult to make, perhaps the corpus can be modified. While this is usually never the case, some projects can entertain this as an option.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.