This tip is part of Learn by Example using SAS® Enterprise MinerTM series where a new data mining topic is introduced and explained with one or more example SAS Enterprise Miner process flow diagrams.
Text Mining is about extracting relevant information from a collection of text documents to uncover the underlying themes and concepts. The integration of SAS Text Miner nodes in a SAS Enterprise Miner process flow diagram enables you to combine quantitative variables with unstructured text thus incorporating text mining with other data mining techniques.
SAS Text Miner supports an extensive list of languages, refer to the product page for additional details. Note that to run these examples, you will require SAS Text Miner add-on to SAS Enterprise Miner installation.
To get started, download the process flow diagrams (XML files) and the accompanying PDF documentation for the following two examples from the GitHub repository at https://github.com/sassoftware/dm-flow/tree/master/TextMining
1. Text Mining Explore: This example uses different techniques to explore textual data or articles. The Text Parsing node takes the raw text and quantifies the terms therein; the Text Filter node filters out any extraneous information; the Text Rule Builder node generates an ordered set of rules that are useful in describing/predicting the target; the Text Cluster node uses SVD or Singular Value Decomposition to cluster articles into multiple groups and the Text Topic node extracts the topics from the article collection. The Control Point node at the end enables the execution of the three flows in this example with a single click.
2. Text Mining Classification: This example classifies textual articles into a news group (graphics, hockey or medical) based on their content. Using a similar flow as in above example, the topics are first extracted and that information is subsequently used in the classification model (Regression, Neural Network, Decision Tree and Memory Based Reasoning) to pick a champion. Finally, the SAS Code node is used to print the list of articles and their predicted news groups in to which they are classified.
To run these examples, refer to the README file that is part of the GitHub repository at https://github.com/sassoftware/dm-flow. Please note that these examples were tested with SAS Enterprise Miner 13.2 and SAS Text Miner 13.2.