BookmarkSubscribeRSS Feed

Effective Markup of Unstructured Text - Part 1

Started ‎11-06-2024 by
Modified ‎11-06-2024 by
Views 1,330

Have you ever known a person at school or work who made great use of their highlighter markers? Maybe it was you?! Every book, paper, and article owned had selective and perfectly highlighted rows of text; some even dawned a strategic color-coding system. These days, we have electronicrhwill_1_highlighters-204x300.jpg books, emails, and apps that allow for electronic highlighting. While the tangible marker may see less use, the idea behind highlighting key information is still essential.

 

This is Part 1 of a two-part series related to marking up and manipulating unstructured text. In this blog, you will learn how SAS Law Enforcement Intelligence can be configured to utilize the Markup Control to automatically markup rows of unstructured text into organized and even searchable data that can be analyzed.

 

Let’s say an incident has occurred at the local school. A teenager, Dante, met his friends to play a friendly game of soccer. While out on the field, the teen left his cellphone sitting on a bench on the sidelines. When the game was over, Dante noticed his phone was missing. When he searched around without success, one of his teammates let Dante use his phone to call the police to report his personal phone missing.

 

When law enforcement arrived, he saw two kids sitting in the stands. When approached, the older kids both gave a statement to the officer in which they each said they had seen John Smith, a kid from the football team, pick up the phone but put it back down. Neither of the two saw anyone else near the bench but couldn’t confirm that John was the person that had taken the phone.

 

When back at the office the officer began to work on the investigation and added the statements into the system. Using the Markup Control, the officer can begin to organize the information. By effectively highlighting specific data, the officer can quickly find significant and essential information for review. This is not only helpful in giving the officer different ways to analyze the data that may not be evident in one large unstructured text like long emails, interviews, or audio transcripts, but it can also be helpful to emphasize this relevant information when revisiting the case later, or for other law enforcement that may also work on the investigation.

 

Let's look at the Markup Control. Text is added to the middle pane of the three-pane control by uploading the content using the Content Extraction button control and selecting the file, or it can be added manually by typing or copy and pasting it directly into the pane. Here you see Ginny Hall's statement has been added.

 

rhwill_2_add_statement-1024x312.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

To organize this information quickly and accurately, concepts are configured by the administrator. Concepts are items that are identified and marked in the text either manually by using the cursor to highlight them, or automatically if configured with that functionality. Automatic markup uses SAS Visual Text Analytics to identify predefined concepts such as people, places, and phone numbers in the text. Selecting Annotate Text for the Automatically annotate text icon to appear. Simply click the icon and the text is marked up.

 

rhwill_3_annotate_auto_markup-1024x270.png

 

In the left pane's Concept Type list we see that the automated markup detected these concepts. In the text pane, you see the concepts highlighted in different colors to quickly and visually differentiate between the different concepts.

 

rhwill_4_annotated-1024x280.png

 

Expanding the concept Person list, we see that four separate names were identified. Ginny Hall has three instances of a person name in the text. We see that the underline highlight color maroon is used for each of the three instances. Gregg Chen was not identified as a person concept, so to add him we select his name in the text and click Annotate text. Only the one instance selected was added. Halfway down the text, the name Gregg is not highlighted, but we know this is the same person. To connect Gregg to Gregg Chen, we select the word Gregg in the text and click Annotate text.

 

rhwill_5_Concept_List.png

 

A New Concept Properties window opens. From here, we select the concept type, Person, and add Gregg to the existing concept, Gregg Chen. Concept Properties can be edited if needed.

 

rhwill_6_New_Concept.png

 

We see that Gregg and Gregg Chen are both underlined now showing their association. Clicking on any concept in the text, like a person name, every concept with the same label (John in this example) is highlighted and the number of times it appears in the text is listed in the right details pane. John and John Smith are the same person and associated by the John Smith Person concept. The text instances show that John Smith appears once in the text and John (with no last name) appears twice.

 

rhwill_7_instances.png

 

Excessive highlighting would include too much color or emphasis making nothing stand out and that could be overwhelming depending on the amount of text the officer needs to read through. To remove a concept that may be a typo or error, or simply not relevant to the importance of the situation, we can either select the concept in the list and click the delete trashcan icon, or select the word in the text pane and click Clear annotation.

 

rhwill_8_Clear_Annotation.png

 

Once the annotation is cleared, the concept is removed from the concept type list (Address), and the highlight is removed from the text (16500 High School Drive, Memphis, TN). Any objects that are linked to other objects can have a concept cleared, however, neither the object nor the relationship is deleted.

 

rhwill_9_Cleared_Annotation_Concepts.png

 

As you see, SAS Law Enforcement Intelligence's Markup Control can be essential in organizing data in an investigation. In Part 2 of this blog, you will learn additional actions that can be utilized by the Markup Control to create a clearer picture and encourage investigation intelligence.

Version history
Last update:
‎11-06-2024 02:57 PM
Updated by:
Contributors

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Labels
Article Tags