While a few of the SAS Visual Text Analytics (VTA) nodes enable users to access some of the CAS table outputs, there are many more CAS tables that are stored in the VTA project caslib that can be extracted and leveraged. For example, you could...
Read on if you'd like to understand how to access and leverage these tables..
First, we need to know the name of the VTA project caslib. This can be found in a number of ways. One simple way, is from the Data tab in your VTA pipeline, make sure no variables are selected and click on the properties icon, which is on the same horizonal level as the filter box but all the way on the right-hand side. The last property is location which shows the CAS server and the project caslib I have highlighted in the screen shot below.
Another option for determining the project caslib is to leverage the "copy link..." for the project, "copy pipeline link..." or "copy node link..." from within the VTA project. Let's use the "copy node link..." from any node right click menu.
We get a pop-up window with a URI that is sharable with other users enabling them to easily access parts of your VTA project. In addition, you'll see almost the same caslib name as we found in the Data tab just minus the "Analytics_Project_" string at the beginning. Notice that there is also a nodeID parameter at the end of the URI that we will take advantage of further down in this article.
In any case, let's set up a libname linked to this caslib and see what CAS tables are available using the libraries section in SAS Studio:
libname vta_proj cas caslib="Analytics_Project_f637043c-9546-489a-830a-3bbca9520432" ;
ods output TableInfo=ProjTables ;
** Print list of CAS tables in the project caslib **;
proc casutil ;
list tables incaslib="Analytics_Project_f637043c-9546-489a-830a-3bbca9520432" ;
run ;
I chose a parsing node and find the following related rows. Note, the last 5 rows below are actually related to a downstream topics node in my pipeline, so if I want the parsing node related CAS tables I would ignore those.
** Get VTA caslib from a node log file or from the Data properties in Model Studio **;
%let projlib = Analytics_Project_f637043c-9546-489a-830a-3bbca9520432 ;
** Name of the table in the project caslib we want to copy **;
%let projTabletoCopy = VAERS2016_PARSING_TERMS_OUT_948F9167-30CF-42FA-92D3-9304F3B4D711 ;
** name of caslib we want to write the copy to **;
%let outputCaslib = casuser ;
** Name of the output CAS table **;
%let outputTableName = PARSING_TERMS_OUT ;
** copy sashdat physical file from project caslib to the output caslib (e.g. casuser) **;
proc casutil incaslib="&projlib." outcaslib="&outputCaslib.";
save casdata="&projTabletoCopy." casout="&outputTableName..sashdat" replace ;
run;
** load sashdat file into memory and promote to global scope **;
proc casutil incaslib="&outputCaslib." outcaslib="&outputCaslib.";
load casdata="&outputTableName..sashdat" casout="&outputTableName." promote;
run;
%let projData = vaers2016 ;
data ProjTables2 ;
set projTables (rename=(name=LongName)) ;
length ID $36. SAS9Name $32. ;
Name = LongName ;
** Case 3: <input data table name>_PARSING_TERMS_OUT_<parsing node ID>_<TOPICS_..._OUT>_<topics node ID> **;
if index(upcase(name),"_PARSING_TERMS_OUT_") > 0 and index(upcase(name),"_TOPICS_") > 0 then do ;
name = substr(name,length("&projData.")+57) ;
ID = scan(name,-1,"_") ;
name = substr(name,1,length(strip(name))-37) ;
end ;
** Case 2: <input data table name>_<descriptive string>_<36 character ID> **;
else if length(strip(name)) > length("&projData.") and upcase(substr(name,1,length("&projData.")+1)) = upcase("&projData._") then do ;
name = substr(name,length("&projData.")+2) ;
ID = scan(name,-1,"_") ;
name = substr(name,1,length(strip(name))-37) ;
end ;
** Case 1: <36 character ID>_<descriptive string> **;
else if length(strip(name)) > length("&projData.") then do ;
ID = substr(name,1,36) ;
name = substr(name,38) ;
end ;
else if strip(upcase(name)) NE upcase("&projData.") then abort ;
** Shorten name to 30 chars or less **;
if upcase(name) = "PARSING_SPELL_CHECKED_POSITION_OUT" then
name = "PARSING_SPELL_CHECK_POS_OUT" ;
else if upcase(name) = "TOPICS_DOCUMENT_PROJECTIONS_OUT" then
name = "TOPICS_DOCUMENT_PROJ_OUT" ;
SAS9Name = strip(name) ;
nameLen = length(strip(name)) ;
if nameLen > 32 then put "Warning: one or more values of name are > 32 chars" ;
run ;
proc sort data=projTables2 ; by ID ; run ;
** Create short ID corresponding to Node ID so can create unique SAS 9 compatible names **;
data projTables2;
set projTables2 ;
by ID ;
length shortID 8. ;
retain shortID 0 ;
if first.ID then shortID = shortID + 1 ;
SAS9Name = strip(SAS9Name) || "_" || strip(put(shortID,8.)) ;
run ;
proc print data=projTables2 (obs=15) ;
var longName SAS9Name ID shortID ;
run ;
Lastly, I have provided a SAS macro below leveraging the PROJTABLES2 SAS dataset to automate copying all of tables related to a user specified Node ID:
%macro copyTables(ProjCaslibName=, ProjTable=, NodeID=, outputCaslib=) ;
%local LongNameList SAS9NameList NumTables i sourceTableName outputTableName ;
proc sql noprint ;
select longname into: LongNameList separated by " " from &ProjTable. (where=(ID = "&NodeID.")) ;
select SAS9name into: SAS9NameList separated by " " from &ProjTable. (where=(ID = "&NodeID.")) ;
select count(*) into: NumTables from &ProjTable. (where=(ID = "&NodeID.")) ;
quit ;
%do i = 1 %to &NumTables. ;
%let sourceTableName = %scan(&LongNameList.,&i.,%str( )) ;
%let outputTableName = %scan(&SAS9NameList.,&i.,%str( )) ;
** copy sashdat physical file from project caslib to the output caslib (e.g. casuser) **;
proc casutil incaslib="&ProjCaslibName." outcaslib="&outputCaslib.";
save casdata="&sourceTableName." casout="&outputTableName..sashdat" replace ;
run;
proc casutil;
droptable casdata="&outputTableName." incaslib="&outputCaslib." quiet;
run;
** load sashdat file into memory and promote to global scope **;
proc casutil incaslib="&outputCaslib." outcaslib="&outputCaslib.";
load casdata="&outputTableName..sashdat" casout="&outputTableName." promote;
run;
%end ;
%mend ;
%copyTables(ProjCaslibName = Analytics_Project_f637043c-9546-489a-830a-3bbca9520432,
ProjTable = projTables2,
NodeID = 948F9167-30CF-42FA-92D3-9304F3B4D711,
outputCaslib = casuser) ;
When I run the above code for my Text Parsing Node and VTA project, I get the following new tables showing up in my casuser caslib:
One more useful caslib related to VTA is the hidden "ReferenceData" caslib. This contains the default Stop Lists the Text Parsing Node leverages when it calls the tpAccumulate CAS action. If desired, you can copy the Stop List for the language of your VTA project and use it as a base from which to make modifications. Let's add a SAS library that points to this caslib...
libname refdat cas caslib="ReferenceData" ;
I hope that you found this article and associated code useful in your text analytics endeavors. If you would like to learn more about the underlying CAS action(s) behind each of the VTA nodes, look out for my next SAS Communities post on SAS Visual Text Analytics coming soon.
Also, the Text Analytics Programming Guide is also a great resource for learning more about the Text Analytics related CAS actions and the outputs that they generate: SAS® Visual Text Analytics: Programming Guide (2024.11)
Hi @NoahPowers ,
first of all, thanks for the article!
I got interested as "starter" in this area, however it was a bit hard for me to follow.
After thinking for a moment, I realize it is not because anything unclear per-se in the article, you do explain a lot - something I truly appreciate. The reason might be because it seems advanced - for me. In my case, I feel I could use some context or objective or a final colophon with conclusions, if that makes sense. Hopefully I will have a clearer picture with the next parts, as I see this is only part I 🙂
Just my 2 cents.
Thanks again for the share, I'm looking forward to getting more knowledge from you!
@JuanS_OCS - My objective was to expose the availability of additional un-documented data files that can be leveraged for additional analysis and also encourage VTA users to also get comfortable with the underlying CAS actions behind the nodes. Perhaps my second related article that shows how to find the CAS action(s) code that are behind each VTA node that I just published will be helpful.. You can access it here: https://communities.sas.com/t5/SAS-Communities-Library/Behind-the-Scenes-with-SAS-Visual-Text-Analyt...
@NoahPowers - thank you, for taking the time to reply to my note. That gives me an excellent context.
I just read your new article and it is excellent. Not only that, but gave me as well the missing pieces I perhaps needed to understand this Part I.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.