10-31-2015 02:35 PM
What is the difference between the words within the topic list where some have a "+" sign in front of them and some don't? I gathered that SAS text miner generates the topic through some projection of the term-by-document matrix. But it is unclear to me in reporting terms in the "topic" that some words have a "+" in front of them. Sometimes the words are bi-grams (2 words).
Any reference or document within the SAS library that I can look this up. Thanks.
10-31-2015 03:40 PM - edited 10-31-2015 03:41 PM
Someone explained this to me as the way Text Miner is telling you that it is ignoring the parsing for a term. But since I don't remember reading it from the doc, I hope someone will call me out if I am talking nonsense .
For example, let's say you are doing text mining on text about sports. Your parsing will identify several terms like "soccer player", "hockey player", "seasonal player", "terrible player", "national player", "outstanding player", etc.
Your text mining algorithm will keep the terms that are relevant. However if none of them are relevant by themselves, the algorithm will merge them into a new topic "+ player". The "+" indicates that the original terms were something+player, but since none of them were relevant by themselves, they were combined into a new term different than the original parsed terms.
I hope this helps!
11-01-2015 12:14 PM
Thank you for your reply. The "+player" example is great. I did not know that "Text Parsing" does n-gram too. I thought that it parses only single words because of the output view of putting words into "noun", "adjective", "verb", and so on.
11-03-2015 01:14 PM
The Text Parsing node identifies one or more roles for each term. For example, the term "player" on all examples we discussed would have the role of 'noun'. Think of examples of the same term with a different role. For example in "he is such a player", you would hope the parsing picks up "player" as an adjective. Maybe there are better examples
Notice that the parsing does pick up n-gram terms. On the results click on "role" to sort them by role, and try to find 'Noun Group'. Those are the best examples of n-grams.
Also keep your participation high on this other SAS community. Lots of interesting stuff going on there as well:
Good luck with your text!