BookmarkSubscribeRSS Feed

Behind the scenes with SAS Visual Text Analytics Part I

Started 2 weeks ago by
Modified 2 weeks ago by
Views 937

While a few of the SAS Visual Text Analytics (VTA) nodes enable users to access some of the CAS table outputs, there are many more CAS tables that are stored in the VTA project caslib that can be extracted and leveraged.  For example, you could...

 

  • Save the CAS tables that contain the text of the Concepts rules and/or Category rules that you built in the VTA user interface
  • Leverage the CAS table that contains the sentiment values from a Sentiment Node
  • See which misspelled words were found from Parsing Node with misspelling detection turned on
  • Do additional reporting and analysis on the terms and thier parents by leveraging the Parsing Node positional output

Read on if you'd like to understand how to access and leverage these tables..

 

First, we need to know the name of the VTA project caslib.  This can be found in a number of ways.  One simple way, is from the Data tab in your VTA pipeline, make sure no variables are selected and click on the properties icon, which is on the same horizonal level as the filter box but all the way on the right-hand side.  The last property is location which shows the CAS server and the project caslib I have highlighted in the screen shot below.

 

NoahPowers_0-1733352976413.png

 

Another option for determining the project caslib is to leverage the "copy link..." for the project, "copy pipeline link..." or "copy node link..." from within the VTA project.  Let's use the "copy node link..." from any node right click menu.  

 

NoahPowers_2-1732659356281.png

 

We get a pop-up window with a URI that is sharable with other users enabling them to easily access parts of your VTA project.  In addition, you'll see almost the same caslib name as we found in the Data tab just minus the "Analytics_Project_" string at the beginning.  Notice that there is also a nodeID parameter at the end of the URI that we will take advantage of further down in this article.  

 

NoahPowers_3-1732659438278.png

 

In any case, let's set up a libname linked to this caslib and see what CAS tables are available using the libraries section in SAS Studio:

 

libname vta_proj cas caslib="Analytics_Project_f637043c-9546-489a-830a-3bbca9520432" ;
 
After running the above code, we take a look at the tables in the library and we only see a single table.  Where are all of the tables that I claimed were in the caslib?  It turns out that the other 100 tables have table names that are greater than 32 characters (not SAS 9.4 compliant) and therefore do not display in the Libraries section.  What is happening is not necessarily obvious to Viya users so I wanted to point this out first.
 
NoahPowers_0-1732659891116.png

 

Let's try using PROC CASUTIL to list the tables in the caslib and we'll use one of the ODS outputs associated with this procedure to save the list of tables into a SAS dataset.
 
ods output TableInfo=ProjTables ;

** Print list of CAS tables in the project caslib **;
proc casutil ;
  list tables incaslib="Analytics_Project_f637043c-9546-489a-830a-3bbca9520432" ;
run ;
 
In the results tab, we first see general information about the caslib such as that it is hidden, which explains why you won't see it in Manage Data/Data Explorer. 
NoahPowers_0-1732657982799.png
 
Next in results, we get a list of each table along with lots of table metadata.  For easier display, I show only the first few columns from the corresponding SAS dataset ODS output table:
 
NoahPowers_1-1732658974097.png

 

We see in some cases, the name of the input data is the first part of the CAS table name but in other cases, there is a 36 character Node ID that comes first.  There are three patterns (other than the one row for the input data) I see from my project listed below: 
 
  1. <36 character ID>_<descriptive string>
  2. <input data table name>_<descriptive string>_<36 character ID>
  3.  <input data table name>_PARSING_TERMS_OUT_<parsing node ID>_<TOPICS_..._OUT>_<topics node ID>
If you find any different patterns in your data, please let me know in the comments section below about the pattern and what was in your pipeline to create these.
 
Now, pick a node from your VTA pipeline from which you want to extract the CAS tables and copy the Node ID value from the "Copy Link..." option from the node right click menu. Highlight the Node ID value and usie the CTRL+C keyboard shortcut to copy the value to the clipboard.
 
NoahPowers_1-1733241830140.png
 
Next, open the projtables dataset from the WORK library and right click on the Name column to select "Quick filter":
NoahPowers_0-1732725252063.png

 

Next choose condition "contains" and paste the Node ID you just copied into the value slot in the dialog window.  Click filter to filter the rows in your table to just those related to the Node ID you selected:
 
NoahPowers_1-1732725346410.png

 

I chose a parsing node and find the following related rows.  Note, the last 5 rows below are actually related to a downstream topics node in my pipeline, so if I want the parsing node related CAS tables I would ignore those.

 

NoahPowers_2-1732725490277.png

 

For each CAS table from the project caslib I want to copy and use, the following code can be run and will both copy the sashdat version of the table and load it into a CAS in-memory table:
 
** Get VTA caslib from a node log file or from the Data properties in Model Studio **;
%let projlib = Analytics_Project_f637043c-9546-489a-830a-3bbca9520432 ;

** Name of the table in the project caslib we want to copy **;
%let projTabletoCopy = VAERS2016_PARSING_TERMS_OUT_948F9167-30CF-42FA-92D3-9304F3B4D711 ;

** name of caslib we want to write the copy to **;
%let outputCaslib = casuser ;

** Name of the output CAS table **;
%let outputTableName = PARSING_TERMS_OUT ;

** copy sashdat physical file from project caslib to the output caslib (e.g. casuser) **;
proc casutil incaslib="&projlib." outcaslib="&outputCaslib.";                                              
   save casdata="&projTabletoCopy." casout="&outputTableName..sashdat" replace ;                             
run;

** load sashdat file into memory and promote to global scope **;
proc casutil incaslib="&outputCaslib." outcaslib="&outputCaslib.";                                              
   load casdata="&outputTableName..sashdat" casout="&outputTableName." promote;                             
run;
Here are some sample records from the resulting copied CAS table into my casuser caslib:
 
NoahPowers_0-1732726769302.png

 

You can copy the CAS tables and keep the long table names but you won't be able to browse them in SAS Studio libraries if the names have length greater than 32 characters.  I prefer to rename them to SAS9 compliant names.  The following SAS code will create new unique short names and extract the Node ID into a separate column for filtering.  Note the code will abort if a new pattern is not found on my original list of three above.
 
%let projData = vaers2016 ;

data ProjTables2 ;
  set projTables (rename=(name=LongName)) ;
  length ID $36. SAS9Name $32. ;
  
  Name = LongName ;
  ** Case 3:  <input data table name>_PARSING_TERMS_OUT_<parsing node ID>_<TOPICS_..._OUT>_<topics node ID> **;
  if index(upcase(name),"_PARSING_TERMS_OUT_") > 0 and index(upcase(name),"_TOPICS_") > 0 then do ;
    name = substr(name,length("&projData.")+57) ;
    ID = scan(name,-1,"_") ;
    name = substr(name,1,length(strip(name))-37) ;
  end ;
  
  ** Case 2: <input data table name>_<descriptive string>_<36 character ID> **;
  else if length(strip(name)) > length("&projData.") and upcase(substr(name,1,length("&projData.")+1)) = upcase("&projData._") then do ;
    name = substr(name,length("&projData.")+2) ;
    ID = scan(name,-1,"_") ;
    name = substr(name,1,length(strip(name))-37) ;
  end ;
  
  ** Case 1: <36 character ID>_<descriptive string> **;
  else if length(strip(name)) > length("&projData.") then do ;
    ID = substr(name,1,36) ;
    name = substr(name,38) ;
  end ;
  else if strip(upcase(name)) NE upcase("&projData.") then abort ;
  
  ** Shorten name to 30 chars or less **;
  if upcase(name) = "PARSING_SPELL_CHECKED_POSITION_OUT" then 
    name = "PARSING_SPELL_CHECK_POS_OUT" ;
  else if upcase(name) = "TOPICS_DOCUMENT_PROJECTIONS_OUT" then
    name = "TOPICS_DOCUMENT_PROJ_OUT" ;
    
  SAS9Name = strip(name) ;
  nameLen = length(strip(name)) ;
  if nameLen > 32 then put "Warning: one or more values of name are > 32 chars" ;
run ;

proc sort data=projTables2 ; by ID ; run ;

** Create short ID corresponding to Node ID so can create unique SAS 9 compatible names **;
data projTables2;
  set projTables2 ;
  by ID ;
  length shortID 8. ;
  retain shortID 0 ;
  if first.ID then shortID = shortID + 1 ;
  SAS9Name = strip(SAS9Name) || "_" || strip(put(shortID,8.)) ;
run ;

proc print data=projTables2 (obs=15) ;
var longName SAS9Name ID shortID ;
run ;
  
The output of the proc print is copied below.
 
NoahPowers_1-1732727254150.png

 

Lastly, I have provided a SAS macro below leveraging the PROJTABLES2 SAS dataset to automate copying all of tables related to a user specified Node ID:

 

%macro copyTables(ProjCaslibName=, ProjTable=, NodeID=, outputCaslib=) ;
  
  %local LongNameList SAS9NameList NumTables i sourceTableName outputTableName ;
  
  proc sql noprint ;
    select longname into: LongNameList separated by " " from &ProjTable. (where=(ID = "&NodeID.")) ;
    select SAS9name into: SAS9NameList separated by " " from &ProjTable. (where=(ID = "&NodeID.")) ;
    select count(*) into: NumTables from &ProjTable. (where=(ID = "&NodeID.")) ;
  quit ;

  %do i = 1 %to &NumTables. ;
    %let sourceTableName = %scan(&LongNameList.,&i.,%str( )) ;
    %let outputTableName = %scan(&SAS9NameList.,&i.,%str( )) ;
  
    ** copy sashdat physical file from project caslib to the output caslib (e.g. casuser) **;
    proc casutil incaslib="&ProjCaslibName." outcaslib="&outputCaslib.";                                              
       save casdata="&sourceTableName." casout="&outputTableName..sashdat" replace ;                             
    run;
    
    proc casutil;
      droptable casdata="&outputTableName." incaslib="&outputCaslib." quiet;
    run;
    
    ** load sashdat file into memory and promote to global scope **;
    proc casutil incaslib="&outputCaslib." outcaslib="&outputCaslib.";                                              
       load casdata="&outputTableName..sashdat" casout="&outputTableName." promote;                             
    run;
    
  %end ;
  
%mend ;

%copyTables(ProjCaslibName = Analytics_Project_f637043c-9546-489a-830a-3bbca9520432,
            ProjTable      = projTables2, 
            NodeID         = 948F9167-30CF-42FA-92D3-9304F3B4D711,
            outputCaslib   = casuser) ;

When I run the above code for my Text Parsing Node and VTA project, I get the following new tables showing up in my casuser caslib:

 

NoahPowers_2-1732728784877.png

 

One more useful caslib related to VTA is the hidden "ReferenceData" caslib.  This contains the default Stop Lists the Text Parsing Node leverages when it calls the tpAccumulate CAS action.  If desired, you can copy the Stop List for the language of your VTA project and use it as a base from which to make modifications.  Let's add a SAS library that points to this caslib...

 

libname refdat cas caslib="ReferenceData" ;
 
My VTA project was based on English Language text, so I find the "EN" stoplist CAS table available.  We can use the same proc casutil techniques as above to copy this into my own caslib for further analysis or modification:
 
NoahPowers_2-1733242323441.png

 

I hope that you found this article and associated code useful in your text analytics endeavors.  If you would like to learn more about the underlying CAS action(s) behind each of the VTA nodes, look out for my next SAS Communities post on SAS Visual Text Analytics coming soon.

 

Also, the Text Analytics Programming Guide is also a great resource for learning more about the Text Analytics related CAS actions and the outputs that they generate: SAS® Visual Text Analytics: Programming Guide (2024.11) 

Comments

Hi @NoahPowers ,

 

first of all, thanks for the article! 

I got interested as "starter" in this area, however it was a bit hard for me to follow.

 

After thinking for a moment, I realize it is not because anything unclear per-se in the article, you do explain a lot - something I truly appreciate. The reason might be because it seems advanced - for me. In my case, I feel I could use some context or objective or a final colophon with conclusions, if that makes sense. Hopefully I will have a clearer picture with the next parts, as I see this is only part I 🙂

 

Just my 2 cents.

 

Thanks again for the share, I'm looking forward to getting more knowledge from you!

@JuanS_OCS  - My objective was to expose the availability of additional un-documented data files that can be leveraged for additional analysis and also encourage VTA users to also get comfortable with the underlying CAS actions behind the nodes.  Perhaps my second related article that shows how to find the CAS action(s) code that are behind each VTA node that I just published will be helpful..  You can access it here: https://communities.sas.com/t5/SAS-Communities-Library/Behind-the-Scenes-with-SAS-Visual-Text-Analyt... 

@NoahPowers - thank you, for taking the time to reply to my note. That gives me an excellent context.
I just read your new article and it is excellent. Not only that, but gave me as well the missing pieces I perhaps needed to understand this Part I.

Version history
Last update:
2 weeks ago
Updated by:
Contributors

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags