Did you miss the Ask the Expert session on Text Mining: Getting Started? Not to worry, you can catch it on-demand at your leisure. I’ve attached the slides as well.
In this session, attendees learned to:
Here are some highlighted questions from the Q&A segment held at the end of the session for ease of reference.
Q1: What languages does Text Miner support?
A: These languages are supported in SAS Text Miner: Arabic, Chinese (simplified and traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, and Vietnamese. Each language must be licensed individually.
Q2: Do I have to have Enterprise Miner in order to use Text Miner?
A: Yes. Text Miner is an add-on to SAS Enterprise Miner. It is a separate group of nodes within the Enterprise Miner interface that enables the processing of unstructured data. As demonstrated, Text Mining can take advantage of Enterprise Miner nodes like Segment Profile. Also, as demonstrated, mining of unstructured data can be combined with mining of structured data to develop predictive models with a greater lift of predictive power.
Q3: Do my text documents have to be a SAS dataset to use in Text Miner?
A: No. The Text Import node has the capability to import text in different formats, such as Word, Excel, Powerpoint, and .pdf.
Q4: Can I use my own Stop/Start list in the Parsing Node?
A: Yes. You can create your own SAS datasets as Stop or Start lists. The format for this datasets is documented in the Text Miner documentation. Samples are provided in the SAMPSIO sample library.
Q5: Does the order of the nodes make a difference in Text Miner?
A: Yes. After the text data is brought in, the text parsing node must precede the text filter node. The other text nodes can be added in any order.
Q6: Can you pull all the email addresses out of the text document collection?
A: Yes I recommend that you turn on ”Find Entities” in the properties of the Text Parsing Node. Then, when viewing the results of that node’s run, save the “Terms” table.The terms table will include all the terms and their Roles. Query that table for all the terms with the role “Internet”.The "Internet" role captures email addresses and URLs.
Q7: Is it possible to select a different cluster algorithm for text clustering?
A: Yes, you can choose either Expectation-Maximization or Hierarchical.
Q8: Does SAS Text Miner work effectively for all lengths of data? For example, will it work just as well for a Twitter feed as if would for a 10-page paper?
A: Yes. Despite the length of the document, SAS Text Miner efficiently handles the task of breaking text into terms/entities. I've personally gotten great results with both long documents and twitter feeds. However, analyzing social media text comes with extra challenges because of the short text length, acronyms, and slang.