BookmarkSubscribeRSS Feed

SAS Text Miner: Getting Started

Started ‎04-17-2017 by
Modified ‎12-21-2017 by
Views 10,239

Did you miss the Ask the Expert session on Text Mining: Getting Started? Not to worry, you can catch it on-demand at your leisure. I’ve attached the slides as well.

 

Watch the webinar

 

 

In this session, attendees learned to:

  • Import dataText image.png
  • Parse and filter text
  • Analyze text data including topic discovery and cluster analysis 
  • Use text mining results as input to predictive modeling

 Here are some highlighted questions from the Q&A segment held at the end of the session for ease of reference.

 

What languages does Text Miner support?


These languages are supported in SAS Text Miner: Arabic, Chinese (simplified and traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, and Vietnamese. Each language must be licensed individually.

Do I have to have Enterprise Miner in order to use Text Miner?


Yes. Text Miner is an add-on to SAS Enterprise Miner. It is a separate group of nodes within the Enterprise Miner interface that enables the processing of unstructured data. As demonstrated, Text Mining can take advantage of Enterprise Miner nodes like Segment Profile. Also, as demonstrated, mining of unstructured data can be combined with mining of structured data to develop predictive models with a greater lift of predictive power.

Do my text documents have to be a SAS dataset to use in Text Miner?


No. The Text Import node has the capability to import text in different formats, such as Word, Excel, Powerpoint, and .pdf.

Can I use my own Stop/Start list in the Parsing Node?


Yes. You can create your own SAS datasets as Stop or Start lists. The format for this datasets is documented in the Text Miner documentation. Samples are provided in the SAMPSIO sample library.

Does the order of the nodes make a difference in Text Miner?


Yes. After the text data is brought in, the text parsing node must precede the text filter node. The other text nodes can be added in any order.

Can you pull all the email addresses out of the text document collection?


Yes I recommend that you turn on ”Find Entities” in the properties of the Text Parsing Node. Then, when viewing the results of that node’s run, save the “Terms” table.The terms table will include all the terms and their Roles. Query that table for all the terms with the role “Internet”.The "Internet" role captures email addresses and URLs.

Is it possible to select a different cluster algorithm for text clustering?


Yes, you can choose either Expectation-Maximization or Hierarchical.

Does SAS Text Miner work effectively for all lengths of data? For example, will it work just as well for a Twitter feed as if would for a 10-page paper?


Yes. Despite the length of the document, SAS Text Miner efficiently handles the task of breaking text into terms/entities. I've personally gotten great results with both long documents and twitter feeds. However, analyzing social media text comes with extra challenges because of the short text length, acronyms, and slang.

 

Recommended Resources

Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From the Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.

 

NOTE: For best results when opening the attached slides, click on the “download” icon.

Comments

hi,

 

Does Visual analytics help, Text mining. I belive the option word cloud does, but I really want to understand how it work.?

 

I want to know, how to but a ditionary of words, build concepts. Can sentiment score be obtained ? Can the tone of the text be assed ?

 

 

Please reply,

Thank you 

Hi,

Back when sentiment was added to the Visual Analytics word cloud a sas blog was written about it. It’s a good quick read that will address some of your questions. You can find that here: http://blogs.sas.com/content/sgf/2015/06/29/how-sentimental-of-you-enabling-sentiment-analysis-in-a-...

 

You can also learn more about working with word clouds in Visual Analytics in the User Guide. This link will take you directly to the Word Cloud section: http://support.sas.com/documentation/cdl/en/vaug/69957/HTML/default/viewer.htm#n1oo7kmcwcn1rsn1ll0bo...

 

You mentioned building a dictionary. You can build a dictionary from your document collection within SAS Text Miner. The Reference Guide talks about that in the 4th chapter - https://support.sas.com/documentation/onlinedoc/txtminer/14.1/tmref.pdf

 

You seem to have an interest in sentiment. Perhaps you were looking to create a sentiment dictionary. SAS offers one, but you can also find some available online with a google search.

 

If you are interested in building concepts you might want to look at the capabilities in SAS Contextual Analysis, Here is the factsheet. http://support.sas.com/rnd/app/handouts/contextual_analysis_mar2017.pdf

 

I hope some of this helpful.

Twanda

Hi Twada,

 

These are the most useful linking and references I have got so far. Thanks a ton. I Will get my sas text miner in place and explore VA  more.

 

Prajna

Hi Twada,

 

On the Visual Analytics platform, if I am generating aword cloud to get a sentiment score; In the Properties tab for the word cloud, I need to enter a number for the Maximum topics.  I understand that minimum topics to discover must altest be two or three ans the maximum has no limits.

 

I want to understand what number should I specify for the Maximum topics, there should a techinque (lets say dendogram or some thing  which decides the kth value) to arive at a numbr and say - "yes, this the number of maximum topics you need to extract for a given document". If its blindly entering a number then it needs a explanantion.

 

example: if there are 10 words in a sentence and I give maximum topics as 10, then to my understanding the purpose of sentiment is not served rather I must know what should be my maximum topic to say - "yes these are geneuine topics and they give me the rigt sentiment score".

 

I will not mind to get on a call to un derstand the tool. Kindly help!!

 

Thank you,

Prajna

Where are the slides again?

Hi guys,

 

Could there be recorded webinars for learning sas text miner!! I am looking forward to generate sentiment scores for a data

 

Kindly help

 

Thank you!

Prajna

Version history
Last update:
‎12-21-2017 11:38 AM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Tags