Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

text miner

Reply
New Contributor
Posts: 3

text miner

What is the difference between the words within the topic list where some have a "+" sign in front of them and some don't? I gathered that SAS text miner generates the topic through some projection of the term-by-document matrix. But it is unclear to me in reporting terms in the "topic" that some words have a "+" in front of them. Sometimes the words are bi-grams (2 words).

 

Any reference or document within the SAS library that I can look this up. Thanks.

 

Super Contributor
Posts: 337

Re: text miner

[ Edited ]

Someone explained this to me as the way Text Miner is telling you that it is ignoring the parsing for a term. But since I don't remember reading it from the doc, I hope someone will call me out if I am talking nonsense Smiley Happy .

 

For example, let's say you are doing text mining on text about sports. Your parsing will identify several terms like "soccer player", "hockey player", "seasonal player", "terrible player", "national player", "outstanding player", etc.

Your text mining algorithm will keep the terms that are relevant. However if none of them are relevant by themselves, the algorithm will merge them into a new topic "+ player". The "+" indicates that the original terms were something+player, but since none of them were relevant by themselves, they were combined into a new term different than the original parsed terms.

 

I hope this helps!

-Miguel

New Contributor
Posts: 3

Re: text miner

Posted in reply to M_Maldonado

Miguel,

Thank you for your reply. The "+player" example is great. I did not know that "Text Parsing" does n-gram too. I thought that it parses only single words because of the output view of putting words into "noun", "adjective", "verb", and so on.

Winson

Super Contributor
Posts: 337

Re: text miner

The Text Parsing node identifies one or more roles for each term. For example, the term "player" on all examples we discussed would have the role of 'noun'. Think of examples of the same term with a different role. For example in "he is such a player", you would hope the parsing picks up "player" as an adjective. Maybe there are better examples Smiley Happy

Notice that the parsing does pick up n-gram terms. On the results click on "role" to sort them by role, and try to find 'Noun Group'. Those are the best examples of n-grams.

 

Also keep your participation high on this other SAS community. Lots of interesting stuff going on there as well:

https://communities.sas.com/t5/SAS-Text-and-Content-Analytics/bd-p/text_analytics

 

Good luck with your text!

Ask a Question
Discussion stats
  • 3 replies
  • 530 views
  • 0 likes
  • 2 in conversation