BookmarkSubscribeRSS Feed

“The 5000-page guide to concise communication” or “How to summarize text with SAS”

Started ‎02-05-2024 by
Modified ‎02-05-2024 by
Views 575

Hello, and welcome to my post! The purpose of this post is to showcase the Text Summarization task in SAS Studio (in less than 5000 pages 😒).

 

The volume of text data seems to increase each day, and we often encounter information overload in our daily activities. It seems that everyone has something to say, whether it is in a video on how to clean a yucky coffee pot, another breaking news report, customer product complaints, or transcribed audio. How do you find meaning in the vast amount of data that is growing every day? 

 

01_saspchtasks0.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

SAS Visual Text Analytics (VTA) users certainly have the tools needed to find answers buried in document collections. (See my post on Getting Started with Text Analytics).

 

The Text Summarization task in SAS Studio is a VTA tool that provides functionality to summarize a document. Ever think “TL;DR” when looking at social media posts? I know that you know this but… TL;DR is an abbreviation for "too long; didn't read."

 

You probably have found yourself in a situation where you could use some help getting a quick summary of a document collection into a reasonable number of sentences. Especially if the alternative is going through “billions” of pages of text.

 

The Text Summarization task in SAS Studio is one possible way to summarize documents quickly and easily. It uses natural language processing (NLP) techniques to summarize a document by identifying representative sentences. It can also be run via the TextSummarization action set in code. It computes the summary for each document in the data set, and it can also generate a single summary for the entire data set.

 

There are various techniques that are used for summarizing text. The purpose of the Text Summarization task is to perform an extractive summary. The software tries to identify meaningful sentences by using NLP (Natural Language Processing) techniques involving terms, entities, and noun groups. Another summarization technique you may have encountered is referred to as an abstractive summary that can use deep learning methods to analyze the text.

 

A fun fact is that the summarization result will be identical to the original document if it has only 2 sentences and you ask for a summary of 2 sentences. If you try this task on really short documents, you might be disappointed with the findings, but now you know why this would happen. Creating a summary of the entire short-document collection may be more valuable, and that’s what we’ll describe next.

 

The table in this example has hundreds of historical customer reviews of an electronics product, and I want to condense each review down to 2 sentences. Some of the reviews are several paragraphs in length and others are only a single sentence.

 

You can find the Text Summarization task in SAS Studio on Viya to get started.

 

02_saspchTasks1.png

 

The Text Summarization task in SAS Studio reduces content down to a maximum number of sentences that you can specify. The maximum number is capped at 32. A maximum number this large may be useful if you are summarizing chapters of a novel or scholarly abstracts. In this case, I’m processing consumers’ product reviews, so I chose to represent each review by only 2 sentences. Since I am also interested in the overall ‘sense’ of the reviews, I am also going to ask for a summary of the entire corpus.

 

Summarizing documents will produce different results depending on what features you generate for the task. You have a choice of parsing terms for the analysis, using entities and noun groups, or selecting both options as I have in this example. Try different combinations of these options with your data and see what options give you the best results for the kind of data you are processing.

 

03_saspchTasks1b.png

 

This type of text summarization is different than a summary that would be created by a Large Language Model (LLM). Some analysts are wary of using LLMs to summarize documents since they may ‘hallucinate’ or just make stuff up, and that is not what you want. There are situations where LLMs may be useful when they are applied after a summarization is done.

 

Here is one document from my dataset. At the risk of getting a TL;DR from you, go ahead and read the original review and the subsequent summarized review. Do you think the summary did a good job capturing the important sentences?

 

Original Review:

 

04_saspchTasks3.png

 

Summary of Review:

 

05_saspchTasks4.png

 

When you experiment with your own text data, ask for a corpus summary and see what combination of parsing options provides you with the best result. This task is pretty nifty to have in your bag of tricks, but it's not designed for all situations. Remember this is only one of the many resources available to process text. I hope that you enjoyed reading about this task and encourage you to watch for additional articles.

 

Thanks for reading!

 

 

Find more articles from SAS Global Enablement and Learning here.

Comments

Great title and an interesting post!

Thanks

Version history
Last update:
‎02-05-2024 11:58 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags