We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Text Miner: Getting Started -- Ask the Expert Q&A

by SAS Employee Twanda on ‎04-17-2017 03:26 PM - edited on ‎04-18-2017 09:26 AM by Community Manager (1,419 Views)

Did you miss the Ask the Expert session on Text Mining: Getting Started? Not to worry, you can catch it on-demand at your leisure. I’ve attached the slides as well.

 

View Webinar

 

In this session, attendees learned to:

  • Import data
  • Parse and filter text
  • Analyze text data including topic discovery and cluster analysis 
  • Use text mining results as input to predictive modeling

 Here are some highlighted questions from the Q&A segment held at the end of the session for ease of reference.

 

Q1:  What languages does Text Miner support?

A:  These languages are supported in SAS Text Miner: Arabic, Chinese (simplified and traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, and Vietnamese. Each language must be licensed individually.

 

Q2:  Do I have to have Enterprise Miner in order to use Text Miner?

A:  Yes.  Text Miner is an add-on to SAS Enterprise Miner.  It is a separate group of nodes within the Enterprise Miner interface that enables the processing of unstructured data.  As demonstrated, Text Mining can take advantage of Enterprise Miner nodes like Segment Profile.  Also, as demonstrated, mining of unstructured data can be combined with mining of structured data to develop predictive models with a greater lift of predictive power.

 

Q3:  Do my text documents have to be a SAS dataset to use in Text Miner? 

A:  No.  The Text Import node has the capability to import text in different formats, such as Word, Excel, Powerpoint,  and .pdf.

 

Q4:  Can I use my own Stop/Start list in the Parsing Node?

A:  Yes.  You can create your own SAS datasets as Stop or Start lists.  The format for this datasets is documented in the Text Miner documentation.  Samples are provided in the SAMPSIO sample library.

 

Q5: Does the order of the nodes make a difference in Text Miner?

A: Yes. After the text data is brought in, the text parsing node must precede the text filter node. The other text nodes can be added in any order.

 

Q6: Can you pull all the email addresses out of the text document collection?

A: Yes I recommend that you turn on ”Find Entities” in the properties of the Text Parsing Node. Then, when viewing the results of that node’s run, save the “Terms” table.The terms table will include all the terms and their Roles. Query that table for all the terms with the role “Internet”.The "Internet" role captures email addresses and URLs.

 

Q7: Is it possible to select a different cluster algorithm for text clustering?

A:  Yes, you can choose either Expectation-Maximization or Hierarchical.

 

Q8: Does SAS Text Miner work effectively for all lengths of data? For example, will it work just as well for a Twitter feed as if would for a 10-page paper?

A: Yes. Despite the length of the document, SAS Text Miner efficiently handles the task of breaking text into terms/entities. I've personally gotten great results with both long documents and twitter feeds. However, analyzing social media text comes with extra challenges because of the short text length, acronyms, and slang.

Comments
by Contributor Prajna_450
on ‎06-09-2017 05:14 AM

hi,

 

Does Visual analytics help, Text mining. I belive the option word cloud does, but I really want to understand how it work.?

 

I want to know, how to but a ditionary of words, build concepts. Can sentiment score be obtained ? Can the tone of the text be assed ?

 

 

Please reply,

Thank you 

by SAS Employee Twanda
on ‎06-09-2017 11:26 AM

Hi,

Back when sentiment was added to the Visual Analytics word cloud a sas blog was written about it. It’s a good quick read that will address some of your questions. You can find that here: http://blogs.sas.com/content/sgf/2015/06/29/how-sentimental-of-you-enabling-sentiment-analysis-in-a-...

 

You can also learn more about working with word clouds in Visual Analytics in the User Guide. This link will take you directly to the Word Cloud section: http://support.sas.com/documentation/cdl/en/vaug/69957/HTML/default/viewer.htm#n1oo7kmcwcn1rsn1ll0bo...

 

You mentioned building a dictionary. You can build a dictionary from your document collection within SAS Text Miner. The Reference Guide talks about that in the 4th chapter - https://support.sas.com/documentation/onlinedoc/txtminer/14.1/tmref.pdf

 

You seem to have an interest in sentiment. Perhaps you were looking to create a sentiment dictionary. SAS offers one, but you can also find some available online with a google search.

 

If you are interested in building concepts you might want to look at the capabilities in SAS Contextual Analysis, Here is the factsheet. http://support.sas.com/rnd/app/handouts/contextual_analysis_mar2017.pdf

 

I hope some of this helpful.

Twanda

by Contributor Prajna_450
on ‎06-12-2017 03:19 AM

Hi Twada,

 

These are the most useful linking and references I have got so far. Thanks a ton. I Will get my sas text miner in place and explore VA  more.

 

Prajna

by Contributor Prajna_450
on ‎06-21-2017 02:58 AM

Hi Twada,

 

On the Visual Analytics platform, if I am generating aword cloud to get a sentiment score; In the Properties tab for the word cloud, I need to enter a number for the Maximum topics.  I understand that minimum topics to discover must altest be two or three ans the maximum has no limits.

 

I want to understand what number should I specify for the Maximum topics, there should a techinque (lets say dendogram or some thing  which decides the kth value) to arive at a numbr and say - "yes, this the number of maximum topics you need to extract for a given document". If its blindly entering a number then it needs a explanantion.

 

example: if there are 10 words in a sentence and I give maximum topics as 10, then to my understanding the purpose of sentiment is not served rather I must know what should be my maximum topic to say - "yes these are geneuine topics and they give me the rigt sentiment score".

 

I will not mind to get on a call to un derstand the tool. Kindly help!!

 

Thank you,

Prajna

by Occasional Learner rachel1234
on ‎06-23-2017 02:19 PM

Where are the slides again?

by Contributor Prajna_450
on ‎06-28-2017 02:28 AM

Hi guys,

 

Could there be recorded webinars for learning sas text miner!! I am looking forward to generate sentiment scores for a data

 

Kindly help

 

Thank you!

Prajna

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.