Unstructured, text data is everywhere. And organizations are collecting more and more text-based data with every passing day. But how do organizations gain insight from such data? Using SAS Visual Text Analytics in Model Studio, it's easy! Let's discuss some of the essentials on how to get started with SAS Visual Text Analytics and cover the components of the default text analytics pipeline in Model Studio. Here are some of the main steps that may be involved in a text analysis:
When you first create your Visual Text Analytics project, Model Studio provides a default pipeline. Here’s what it looks like with a brief discussion of its capabilities:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The pipeline starts with data, which at a minimum for a text analysis, is a collection of documents.
Next comes the Concepts node which is all about extracting information. The software has 9 predefined concepts, including but not limited to, dates, measures, and organizations. Custom concepts, specific to the analysis at hand, can be created by the user as well.
After the Concepts node comes the Text Parsing node. The Text Parsing node starts by using natural language processing to determine parts of speech for terms found in the document collection. The node then generates the start and stop lists for the analysis. The start list, also called the kept terms, is the list of terms in the document collection that are used in the analysis. All following nodes in the pipeline make use of these kept terms. The Test Parsing node places any term in the document collection that will be ignored during the analysis into the stop list, also called the dropped terms. Terms may be placed into the stop list for not providing useful information for the analysis or for appearing in too few documents. Because the Concepts node comes before the Text Parsing node, concepts will be allowed to appear in the start list.
Next the Sentiment node is used to determine positive, neutral, or negative sentiment about each document.
The Topics node uses latent semantic analysis to discover natural groupings of important terms. That is, terms in the start list that often appear together in single documents within the document collection. The topics are automatically generated and are assigned to documents. More than one topic can be assigned to a single document. Topics can help reveal prevailing “themes” that occur within the document collection.
Finally, the Categories node can be used to extract Boolean rules to identify documents that belong to categories. The categories can be based on variables in the input data which have defined roles of category, user created custom-categories, or even topics that have been promoted to categories.
This default pipeline can be edited, to allow for a custom Visual Text Analytics project on subsets of the nodes.
What if you need to process a document collection not written in English? No problem! Visual Text Analytics currently supports over 30 languages.
Are you interested in learning more? One way is to take an instructor lead course. The course SAS Visual Text Analytics in SAS Viya will definitely get you well on your way into the wonderful world of text analytics. There's an alternative way to start your text analytics journey, but allow me to start by asking another question. Are you attending the SAS Explore Conference, September 11-14, in Las Vegas, NV? If not, be sure to register soon (explore.sas.com)! Once you are there be sure to check out my Hands-on-Workshop, SAS Visual Text Analytics in SAS Viya, which will take place 11:00 AM on Tuesday, September 12. Whether you are new to text analytics, a seasoned expert, or just looking for something new and interesting to try out, there will be something for you in the workshop!
For the workshop, attendees will be provided with access to SAS Visual Text Analytics software, all required data, and step-by-step instructions to get started.
I hope to see you in class soon or at my workshop in September at SAS Explore!
Find more articles from SAS Global Enablement and Learning here.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.