The purpose of this post is to shed light on three new features released in 2025 for the Concepts node when using SAS Visual Text Analytics in Model Studio. This post assumes readers are already familiar with SAS Visual Text Analytics software running in Model Studio. If not, you can catch up on some of the basics in this post.
Although these new features are available in Visual Text Analytics through writing code, this post will focus on Visual Text Analytics working through pipelines in the point-and-click, drag-and-drop interface Model Studio. The new features covered here were released at different times in 2025, specifically in the 2025.04, 2025.06, and 2025.08 SAS Viya stable releases. Sorry that I am just getting around to posting about them now, but it is hard to keep up with the rate at which SAS technology advances! I’m sure that new and most regular VTA users, unless working with the absolute latest release, will learn some new and useful things in this post.
All three of the new features discussed here are for the Concepts node. Extracting concepts, from a text analytics perspective, is all about using natural language processing to extract specific information contained within documents. Concepts can be predefined by the software, such as concepts to extract measures, places, dates, and noun groups. Custom concepts can also be written by the user using the extensive LITI, Language Interpretation of Textual Information, coding language.
Chunk size:
Released with the 2025.04 stable version of SAS Viya, a new Chunk Size property is available for the Concepts node.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Running the Concepts node against very large documents contained within a corpus can lead to slow processing time. Especially if a lot of custom (i.e., user-created) concepts are written. The new Chunk Size property can lead to faster processing times in these cases. This property simply provides options for the user to specify whether to split input documents into chunks. There is a trade-off, however, to the improved run time. Terms can be compared only in the same chuck and matches cannot be found between chunks. So, users should be aware they are potentially giving up a bit of accuracy for improved efficiency when using this option. As shown in the screen shot above, the default for this property is to split input documents such that the chunk size is 32 kilobytes. This is seen after expanding the Chunk Size property. The other options for chunk size are found under the Type sub-property.
Expanding the Type sub-property shows the default chunk size and the other options:
The default chunk size, 32K, indicates processing data in 32-kilobyte chunks. This value has been used in prior releases of SAS Visual Text Analytics, but as of the 2025.04 release, the user now has other options. Using All indicates that chunking is not performed. This setting would lead to slower run times for large documents given the high memory usage, but it would also lead to the most accurate results. When Type is set to Custom, additional options become available.
Using the Custom option for chunk size provides the ability to the user to select their own chunk size. The Value option allows the user to provide a numeric value, and the Unit option allows for choices between bytes, kilobytes, and megabytes.
So, for example, setting Value to 64 and choosing kilobytes for Unit, allows for documents to be split into chunk sizes of 64K. This might be the case if very large documents are being processed, but the user prefers to err on the side of accuracy, over improved processing time.
Alternate name:
Beginning with the 2025.06 stable release of SAS Viya, a new option was provided when it comes to naming custom concepts. Custom concepts are concepts created by the user using LITI code. There have always been strict rules when it comes to naming custom concepts. For example, custom concept names can consist only of characters, numbers, or underscores.
One of the primary reasons for the software having strict naming rules is that the name for a custom concept cannot be an actual term (or token) that exists within the corpus. This creates problems for the software and could be potentially confusing for the user. So, one recommendation from the User’s Guide, and this is something we teach in our visual text analytics course, is to use underscores at the beginning and end of the custom concept name. If you want to give it a name with spaces, use underscores to take the place of spaces. (For more on custom concepts naming convention, see the User’s Guide in product documentation.)
Some of these strict rules may lead to vague or potentially confusing names for custom concepts where perhaps the underlying purpose of the concept is lost. The new feature which does not follow the strict naming conventions is for a field called Alternate name.
The Alternate name field allows the user to provide a secondary name for the custom concept which does not follow the strict rules spelled out in product documentation for the primary name. You can think of using this Alternate name property as a way to provide a brief description or more user-friendly, meaningful name for the custom concept.
Notice that in the example above, the Alternate name includes spaces, which are not permitted in the primary name for the concept. When using this feature, the Alternate name is shown in parentheses next to the actual custom concept name in the Concepts list shown to the left of an open Concepts node.
Match Type:
Finally, in the 2025.08 stable SAS Viya release a new property of Match Type was included for the Concepts node.
This option is used to specify an input match type. This affects the number of matches that are returned for specific concepts. The three options for this property are All matches (Default), Best match, and Longest match.
When using All matches, the default setting, all the concepts that match the input text are returned as matches. Using Best match returns as the match the concept with the highest priority value. Concept rules have different priority values. For example, predefined concepts have higher priority values compared to custom concepts and within predefined concepts, Measure concepts have a higher priority value compared to Money concepts. This will be relevant in the example I discuss below. When Match type is set to Longest match, the concept which has the longest match is returned.
There are a few additional details about Best matches. When there are multiple overlapping matched concepts that have the same priority value, then the returned match is the concept with the longest match. When there are multiple overlapping matched concepts with the same priority value and are of equal length, then the returned concept for the Best match is the first concept which is compiled.
Let’s discuss two examples focusing on using predefined concepts to better understand the differences between Match type settings.
All matches versus Best match:
First, I’ll provide an example illustrating the difference between All matches and Best match. Below is a screenshot from an opened Concepts node when All matches is used for Match type. The text data being analyzed is self-reported feedback from patients taking medication to treat depression and anxiety.
Let’s start by first considering matches for predefined Measure concepts. (Note that in the software, all predefined concept names start with nlp.) We are looking at document 0001 and two of the matches that are returned for nlpMeasure are “40 pounds” and “10 pounds”. The total number of matched documents for this concept is 784.
Below is a screen shot from the same opened concepts node but showing matches for nlpMoney.
For the same document (ID=0001) “40 pounds” and “10 pounds” are also returned as matches for the money concept. (The predefined nlpMoney concept matches for some international monetary units such as British pounds. Incidentally these returned matches are actually units of weight in the context of the document, thus technically they are incorrectly matched as monetary units.) Notice also that the total number of matched documents for nlpMoney is 49.
Below is a screen shot of the same data being analyzed above, but with the concepts node having Match type set to Best match. (Note that in the upper left corner, the name of this concepts node with different setting from the above results is now Concepts (1).)
The total number of matched documents for the nlpMeasure concept is still 784, and the same two matches for document 0001, “40 pounds” and “10 pounds”, are returned. But predefined Measure concepts have a higher priority value (equal to 20) compared to predefined Money concepts (priority value equal to 18). Let’s see what we get for nlpMoney matches.
First notice that the total number of nlpMoney matched documents has dropped to 11. This is because some matches returned for nlpMoney when Match type is set to All matches, are not returned when Match type is set to Best match. In other words, matches that were being double counted in the first case as both Measure and Money concepts are no longer being returned as matches for Money concepts since Measure concepts have a higher priority value. The values “40 pounds” and “10 pounds” are best matched as Measures, in this second case, based on priority. Document 0001 is no longer even shown as a document containing matches for nlpMoney. The Matched table is sorted by document ID and the first document listed is 0144.
All matches versus Longest match:
In this second example I’ll compare results when Match type is set to All matches compared to Longest match. Again, to illustrate the difference let’s consider nlpMeasure and nlpMoney matches but for a different document in the same data set. Below is a screen shot from an open concepts node when Match type is set to All matches.
There are again a total of 784 matched documents for nlpMeasure. Document 1649 is being shown where a match of “40 pound” is being returned. Let’s look at matches for the money concept.
Now looking at returned matches for nlpMoney notice that a slightly different, longer match containing “40 pound” is returned for the same document. Here the returned match is “a 40 pound”. All matches containing “40 pound” are returned.
Let’s see what happens when Match type is set to Longest match. Below, notice the name of the concepts node in the upper left corner has changed to Concepts (2). The node was renamed because of the difference to the Match type property.
Looking again at document 1649, in this case, there is no returned match for “40 pound”. The total number of matched documents for nlpMeasure has also dropped to 746. That is because some matches for Measures when Match type is set to All matches are no longer being returned here. Those matches were longer matches for other concept types (and returned for them) and are no longer being double counted.
Let’s look at returned matches for the Money concept.
Here we see document 1649 is returning a match of “a 40 pound”, as it did in the first part of this example. The match is being returned here as opposed to the nlpMeasure concept because the input matched text is longer for nlpMoney.
I hope these explanations of the new Chunk size, Alternate name, and Match type features for the Concepts node are helpful. I’d love to hear from anyone that may make use of any of these new features! Please leave a comment below if you find any of them useful in your next text analytics analysis.
For more on SAS Visual Text Analytics, in general, or Concept Rules, in particular:
Training: SAS Visual Text Analytics in SAS Viya
Textbook on Concept Rules: SAS Text Analytics for Business Applications: Concept Rules for Information Extraction Models
Find more articles from SAS Global Enablement and Learning here.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.