BookmarkSubscribeRSS Feed
iconrado
SAS Employee

Photo by Sheri Hooley on UnsplashPhoto by Sheri Hooley on Unsplash

 

 

Every Christmas, children across the globe will write letters to Santa Claus to make their Christmas wishes known. This creates a logistical nightmare for those residing on the North Pole who will need to process all the letters that they receive in time so that they can prepare their Christmas deliveries. One piece of information that Santa Claus and his elves will need to extract from the text is the names, addresses, and contact information of those contacting them.

 

In order to demonstrate how we can automatize the extraction of this information from the letters, we are going to make use of the concepts node within Visual Text Analytics on SAS Viya. The concepts node allows us to extract specific information from the text. We will be using a combination of the pre-defined concepts that are readily available and some custom concepts that we will specify ourselves. In order to make use of both, we will need to select “include predefined concepts” before running our node.

 

Our Visual Text Analytics pipeline with our concepts node.Our Visual Text Analytics pipeline with our concepts node.

 

The first two pieces of information that Santa and his elves will need are the name and the location of those contacting him. We can extract this information using two of our predefined concepts: nlpPerson and nlpPlace.

 

Here we can see that we have managed to extract mentions of names from the letters.Here we can see that we have managed to extract mentions of names from the letters.

 

Just in case, we may be interested in pulling out some additional information from our letters, such as, phone number or perhaps email. To achieve this, we will need to create some custom concepts. Concepts rules need to be written using LITI (language interpretation for textual information). LITI syntax is quite flexible and provides a lot of opportunities for extracting information from your text.

 

In this specific example, we will be using REGEX rules to help us identify patterns of information in our text. Below are two examples of REGEX rules that can be used to extract phone number and email.

 

Extract Email:
REGEX:[\w\.-]+@[\w\.-]+

Extract Phone Number (matches US numbers):
REGEX:\(?\d{3}[\)\s\-]*\d{3}[\s\-]?\d{4}

Here we can see that we have managed to extract the email addresses from our text.Here we can see that we have managed to extract the email addresses from our text.

 

We have many more possibilities within text analytics if we want to categorize the letters by wish type or pull out specific product names. We can include each of these aspects in a text analytics pipeline to automatize the tedious and manual work of going through each of the letters.

 

However, now that we have quickly and accurately extracted the names, addresses, and contact info from our letters we can be positive that the elves will deliver the gifts and holiday cheer on time 😊

 

Happy Holidays!