When one wants to extract useful information from unstructured data, one uses Concepts. A Concept is a key data element such as a book title, last name, city, gender, and so on. Concepts are useful for analyzing information in context and for extracting useful information.
In this article, I will show how to implement Custom Concepts in SAS Visual Text Analytics’ visual and programming interfaces. In the programming interface, I will use Action Sets available in SAS Visual Text Analytics 8.2 in SAS Viya 3.3 and show examples of the LITI rules for CLASSIFIER, CONCEPT_RULE and PREDICATE_RULE.
In SAS Visual Text Analytics, you can write rules for recognizing concepts that are important to you, thereby creating Custom Concepts. For example, if you were planning a vacation and had a series of documents with information on accommodations and their attractions nearby, you could create a Custom Concept called GreatLocation that identifies accommodations in a desirable Location. Also, you could create a Custom Concept called NearToFun which extracts locations near to music events and museums. You could specify that the concept NearToFun is identified when the terms museum, music, band, or festival are encountered in a document.
It is important to mention that Forrester ranked SAS a Leader in The Forrester Wave™: AI-Based Text Analytics Platforms, Q2 2018 where you can read these lines:
“… SAS Visual Text Analytics is fully integrated with SAS Visual Analytics — a self-service BI and discovery tool — both of which run on the highly scalable SAS Viya in-memory grid architecture. SAS's brand speaks for itself as a leader in advanced analytics; as a result, SAS Visual Text Analytics comes with a number of machine learning models.”
SAS Visual Text Analytics provides nine predefined concepts such as dates, people, places, measurements, mentions of currency which are concepts whose rules are already written to save development time. In the photo below, you can see examples of the nlpNounGroup and nlpMoney predefined concepts.
Custom Concepts are useful for defining a specific concept, or can be referenced as an argument in any of the rules-based definitions. In customized Healthcare or Legal applications, groups of continuous terms and the value of some of those terms might be of ultimate importance.
SAS Visual Text Analytics does context-sensitive matching using complex advanced linguistic rules called LITI (or Language Interpretation for Textual Information). With these rules, concepts are matched in a specific context. LITI rules are SAS proprietary. See in the photos below the process to create a Custom Concept in the SAS Visual Text Analytics visual interface. The first Custom Concept is called HotelAmenities and the second is called testFact.
For HotelAmenities the LITI rules are:
CLASSIFIER:Complimentary breakfast
CLASSIFIER:Restaurant
CLASSIFIER:Free parking
CLASSIFIER:Swimming pool
CLASSIFIER:Bar
CONCEPT_RULE:HOTEL_AMENITIES:(SENT,(OR,"_c{internet}","_c{wifi}","_c{wi-fi}"),"free","lobby")
Some of the matches are “ … a safe and free Swimming pool in the backyard” and “free wifi in lobby”
The LITI rule for Fact extraction is PREDICATE_RULE, for the testFact is:
PREDICATE_RULE:TEST_FACT(nlpPlace):(DIST_4,"_nlpPlace{Beacon Hill}",(OR,"Downtown","subway","clean"))
Notice the use of the predefined concept nlpPlace. This rule will match documents that include the term "Beacon Hill" and either of these three terms: "Downtown,” ” subway,” ”clean.”
Notice the documents that are matched with this rule: “… Beacon Hill. Clean, well appointed and convenient” and “… quiet area in Beacon Hill. Very clean. Kitchen is moderately equipped.”
Previously, I wrote about SAS Viya and Text Mining Action Sets. In this article, I will use Action Sets newly available in SAS Visual Text Analytics 8.3 in SAS Viya 3.4. Action Sets and Actions are important because the same Action Sets and Actions are used no matter the client used to make the request. The examples in this post are worked in CASL, but you could just as easily use Python or Java.
I wrote two short programs. The program ValidateConcept checks that the rule definitions have the correct syntax. It uses the action validateConcept from the action set textRuleDevelop.
There are slight differences in the syntax for LITI rules in the visual and the programming interfaces. The rules with the correct syntax are then used in the program CustomConcept, which uses the action textRuleDevelop from the action set compileConcept to compiles the concept rules and generates an LI binary. This LI binary is used to score a new dataset by the action textRuleScore from the action set applyConcept.
The code for this implementation can be seen below in the Appendix.
Note: If you decide to run the code provided in this article, my recommendation is to copy it into Notepad and then into SAS Studio V. The spaces are key, as well as the quotation marks which should be “.
The main parts of that code are:
ValidateConcept.sas
User validate Concept to check the syntax of the concept rules is correct. Notice what is the correct syntax for the LITI rules for CLASSIFIER, CONCEPT_RULE and PREDICATE_RULE.
The output of this program is the table ERROR, and if it doesn’t have any then Number of Rows =0 indicates that the syntax of the rules is correct. Once they are no errors one can continue with the next program.
CustomConcept.sas
The output of the action concept rules is the binary LI file. The output of applyConcept is the second table in the photo below
The results of the textRuleScore are two tables: OUT_CONCEPT and OUT_FACT.
The concept matches are shown in the OUT_CONCEPT table:
The fact matches are shown in the OUT_FACT table:
SAS Visual Text Analytics facilitates the development and implementation of Custom Concepts in both its visual and programming interfaces. There are slight differences in the syntax for LITI rules in the visual and the programming interfaces.
SAS Visual Analytics 8.2: Programming Guide
Thanks to Seung Lee for verifying the syntax of the LITI rule CONCEPT_RULE.
ValidateConcept.sas
/***************************************************************************/
cas mysess sessopts=(caslib=casuser timeout=1800 locale="en_US" metrics=true);
caslib _all_ assign;
data casuser.concept_rules;
length config $300 ;
infile datalines delimiter='|' missover;
input config$;
datalines;
ENABLE:COMPANY
FULLPATH:COMPANY:Top/COMPANY
PRIORITY:COMPANY:10
CASE_INSENSITIVE_MATCH:COMPANY
CLASSIFIER:COMPANY: Microsoft
CLASSIFIER:COMPANY: Amazon
CLASSIFIER:COMPANY: Google
ENABLE:HOTEL_AMENITIES
FULLPATH:HOTEL_AMENITIES:Top/HOTEL_AMENITIES
PRIORITY:HOTEL_AMENITIES:10
CASE_INSENSITIVE_MATCH:HOTEL_AMENITIES
CLASSIFIER:HOTEL_AMENITIES: Complimentary breakfast
CLASSIFIER:HOTEL_AMENITIES: Restaurant
CLASSIFIER:HOTEL_AMENITIES: Swimming pool
CLASSIFIER:HOTEL_AMENITIES: Bar
CONCEPT_RULE:HOTEL_AMENITIES:(SENT,(OR,"_c{internet}","_c{wifi}","_c{wi-fi}"),"free","lobby")
ENABLE:TEST_FACT
FULLPATH:TEST_FACT:Top/TEST_FACT
PRIORITY:TEST_FACT:15
CASE_INSENSITIVE_MATCH:TEST_FACT
PREDICATE_RULE:TEST_FACT(nlpPlace):(DIST_4,"_nlpPlace{Beacon Hill}",(OR,"Downtown","subway","clean"))
ENABLE:Top
FULLPATH:Top:Top
PRIORITY:Top:10
CASE_INSENSITIVE_MATCH:Top
;
run;
data casuser.apply_concept_text; /* 3 */
length text $300 ;
infile datalines delimiter='|' missover;
input docid text$;
datalines;
1| I just bought an amazon fire tablet
2| microsoft Windows in an operating system
3| In beacon hill location clean studio with easy keypad access
4| a safe and free Swimming pool in the backyard
5| quiet area in Beacon Hill. Very clean
6| free wifi in lobby
;
run;
proc cas;
builtins.loadActionSet /
actionSet="textRuleDevelop";
builtins.loadActionSet /
actionSet="textRuleScore";
textRuleDevelop.compileConcept /
casOut={name="outli", replace=TRUE}
config="config"
table={name="concept_rules"};
run;
textRuleScore.applyConcept /
casOut={name="out_concept", replace=TRUE}
docId="docid"
factOut={name="out_fact", replace=TRUE}
model={name="outli"}
table={name="apply_concept_text"}
text="text";
run;
table.fetch /
table={name="out_concept"};
run;
table.fetch /
table={name="out_fact"};
run;
quit;
CustomConcept.sas
/***************************************************************************/
cas mysess sessopts=(caslib=casuser timeout=1800 locale="en_US" metrics=true);
caslib _all_ assign;
data casuser.concept_rules;
length config $120 ;
infile datalines delimiter='|' missover;
input config$;
datalines;
ENABLE:HOTEL_AMENITIES
FULLPATH:HOTEL_AMENITIES:Top/HOTEL_AMENITIES
PRIORITY:HOTEL_AMENITIES:10
CASE_INSENSITIVE_MATCH:HOTEL_AMENITIES
CLASSIFIER:HOTEL_AMENITIES: Complimentary breakfast
CLASSIFIER:HOTEL_AMENITIES: Restaurant
CLASSIFIER:HOTEL_AMENITIES: Swimming pool
CLASSIFIER:HOTEL_AMENITIES: Bar
CONCEPT_RULE:(SENT,(OR,"_c{internet}","_c{wifi}","_c{wi-fi}"),"free","lobby")
ENABLE:CONVENIENT_COST
FULLPATH:CONVENIENT_COST:Top/CONVENIENT_COST
PRIORITY:CONVENIENT_COST:10
CASE_INSENSITIVE_MATCH:CONVENIENT_COST
CONCEPT_RULE: (SENT,(DIST_6,”_c{nlpMoney}”,(OR,”cost”,”reasonable”,”convenient”)))
ENABLE:TEST_FACT
FULLPATH:TEST_FACT:Top/TEST_FACT
PRIORITY:TEST_FACT:10
CASE_INSENSITIVE_MATCH:TEST_FACT
PREDICATE_RULE:(nlpPlace):(DIST_4,"_nlpPlace{Beacon Hill}",(OR,"Downtown","subway","clean"))
;
run;
data casuser.apply_concept_text;
length text $200 ;
infile datalines delimiter='|' missover;
input docid text$;
datalines;
1| a safe and free parking at the backyard
2| cost was under $30 including tip and the drivers were great. Everyone in Boston was friendly and helpful
3| Taxi costs between 6-10 dollars for a trip downtown. The home was inviting.
4| Perfect location for Beacon Hill. Clean, well appointed and convenient
5| Also, his 1 bedroom apartment in Beacon Hill is clean, charming, and in the ideal location
6| Safe quiet area in Beacon Hill. Very clean.
;
run;
proc cas;
builtins.loadActionSet /
actionSet="textRuleDevelop";
builtins.loadActionSet /
actionSet="textRuleScore";
textRuleDevelop.compileConcept /
casOut={name="outli", replace=TRUE}
config="config"
table={name="concept_rules"};
run;
textRuleScore.applyConcept /
casOut={name="out_concept", replace=TRUE}
docId="docid"
factOut={name="out_fact", replace=TRUE}
model={name="outli"}
table={name="apply_concept_text"}
text="text";
run;
table.fetch /
table={name="out_concept"};
run;
table.fetch /
table={name="out_fact"};
run;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.