The New Unmatched Documents Tab in SAS Visual Text Analytics

1 Like

The purpose of this post is to shed light on the new Unmatched Documents tab when working with certain SAS Visual Text Analytics nodes in Model Studio. This post assumes readers are already familiar with Visual Text Analytics (VTA) software running in Model Studio.

First, I must admit I’m using the term “new” a bit loosely. The Unmatched Documents tab has been available since the 2024.11 stable release of SAS Viya. This SAS Viya release dates back to November 2024. So, given the rate at which technology is advancing these days, some may consider this to be an “old” addition! I think, however, to most VTA users (me included) this will come as fresh light and a feature they have not seen nor used before. Please do not take offense that I’ll continue to refer to this as a “new” feature throughout this post. I also stated above that this new tab is available for “certain” nodes when using VTA in Model Studio. I will not include screen captures or examples for this new tab in all the nodes it appears in, because that would be overly redundant. Once you get a feel for what information this tab provides and how it helps the analyst, I’m pretty sure you’ll get the point of it. So, to get this conversation out of the way, the new Unmatched Documents tab is available in the Interactive Window for the Categories node, the Concepts node, the Text Parsing node, and the Topics node. If you use VTA regularly in Model Studio, you are familiar with the Matched Documents tab available in the Interactive Window for these nodes. Well, the Unmatched Documents tab is like the Matched Documents tab, but it simply shows the documents that do not have matches for the selected category, concept, term, or topic.

I’ll illustrate the new tab by working through an example initially in the Concepts node. The example uses data that is based on feedback from patients taking medication for depression or anxiety. The data have been fully anonymized and even the names for the drugs are artificial. We are trying to extract drug dosages from the patient feedback data.

After running the concepts node for a VTA project, the node can be opened in the Interactive Window. For this example, custom LITI code using the REGEX rule was written to extract drug dosages in milligrams (mg). Below is the Interactive window for the Concepts node with the rules for the custom concept shown.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

When looking at the usual Document portion of the window, the new Unmatched Tab is immediately visible adjacent to the Matched documents tab.

In the Documents window, the All tab indicates that the data are made up of 1,414 documents. Each document is a written comment from a patient taking depression or anxiety medication. Below is a screen shot showing typical comments.

Clicking the Matched tab shows that 300 of the 1,414 documents contain a dosage amount. The two documents below show matches of 225mg and 150mg, respectively.

Clicking the Unmatched tab shows that 1,114 documents (1,414 total documents – 300 matched documents) did not contain matches for a dosage.

One reason the analyst might have for reviewing documents without matches is to examine documents that may have missed the concept which is being extracted. A reason that a concept may be missed would perhaps be due to the custom LITI code not being written accurately. So, in a way, perusing unmatched documents could assist in debugging or improving the LITI code. (In the current example, the analyst may consider using the search functionality to look for unmatched documents that contain “mg”.) Here’s an example of what I mean. Scrolling down the Unmatched list shows the following document:

We see that the document contains a dosage, but the dosage appears to be a range (2-5mg) rather than an individual amount. The analyst could then decide if they want to rewrite the LITI code to account for ranges of dosage amounts or perhaps ignore this if only individual dosages are to be extracted.

Let’s take a look at the Unmatched tab for another node. Below, the text parsing node has been run and we are investigating the term “depression” in the same data described above. The Unmatched tab in the Documents window has been selected.

We see that of the 1,414 total documents, 492 contain the term depression and 922 do not. Since the Unmatched tab has been selected, documents that do not contain the term depression are shown. The analyst might want to investigate the unmatched documents for common misspellings of the term of interest or to gain insight into, in other words, get a feel for, documents lacking the selected term.

As stated earlier, the Unmatched tab is also available in the Interactive Window for the Topics and Categories nodes.

I hope this explanation of the “new” (or “old”?) Unmatched documents tab in the Interactive Windows for VTA software has been helpful. I’d love to hear from you if you take advantage of this new VTA feature. Please leave me a comment if you use it in an analysis. I’d love to have examples of how analysts take advantage of this feature so I can share uses of it with customers that may take my course in the future!

For more on:

Training in Text Analytics

SAS Visual Text Analytics

VTA Pipeline Overview

The Text Window Feature for Term Maps

Find more articles from SAS Global Enablement and Learning here.