BookmarkSubscribeRSS Feed
winson
Calcite | Level 5

What is the difference between the words within the topic list where some have a "+" sign in front of them and some don't? I gathered that SAS text miner generates the topic through some projection of the term-by-document matrix. But it is unclear to me in reporting terms in the "topic" that some words have a "+" in front of them. Sometimes the words are bi-grams (2 words).

 

Any reference or document within the SAS library that I can look this up. Thanks.

 

3 REPLIES 3
M_Maldonado
Barite | Level 11

Someone explained this to me as the way Text Miner is telling you that it is ignoring the parsing for a term. But since I don't remember reading it from the doc, I hope someone will call me out if I am talking nonsense Smiley Happy .

 

For example, let's say you are doing text mining on text about sports. Your parsing will identify several terms like "soccer player", "hockey player", "seasonal player", "terrible player", "national player", "outstanding player", etc.

Your text mining algorithm will keep the terms that are relevant. However if none of them are relevant by themselves, the algorithm will merge them into a new topic "+ player". The "+" indicates that the original terms were something+player, but since none of them were relevant by themselves, they were combined into a new term different than the original parsed terms.

 

I hope this helps!

-Miguel

winson
Calcite | Level 5

Miguel,

Thank you for your reply. The "+player" example is great. I did not know that "Text Parsing" does n-gram too. I thought that it parses only single words because of the output view of putting words into "noun", "adjective", "verb", and so on.

Winson

M_Maldonado
Barite | Level 11

The Text Parsing node identifies one or more roles for each term. For example, the term "player" on all examples we discussed would have the role of 'noun'. Think of examples of the same term with a different role. For example in "he is such a player", you would hope the parsing picks up "player" as an adjective. Maybe there are better examples 🙂

Notice that the parsing does pick up n-gram terms. On the results click on "role" to sort them by role, and try to find 'Noun Group'. Those are the best examples of n-grams.

 

Also keep your participation high on this other SAS community. Lots of interesting stuff going on there as well:

https://communities.sas.com/t5/SAS-Text-and-Content-Analytics/bd-p/text_analytics

 

Good luck with your text!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1353 views
  • 0 likes
  • 2 in conversation