BookmarkSubscribeRSS Feed

SAS Visual Text Analytics creating concept rules programmatically.

Started ‎01-04-2024 by
Modified ‎01-04-2024 by
Views 904

SAS® Visual Text Analytics: How you can Create concept rules programmatically.

 

Hello, and welcome to my post! The purpose of this post is to provide helpful insights for working with text document collections. This post addresses questions students have posed recently that you may also find helpful when working with SAS® Visual Text Analytics.

 

In this installment we will consider "case sensitivity" in custom concept rules as well as how to create concept rules programmatically.

 

There are excellent sources of information already available on writing custom concept rules. Some sources are the product documentation or the SAS Support Community articles. These are great starting points if you are interested in learning the basics of writing concept and category rules.

 

Let’s start with a quick tip that you may not be aware of. Did you know that when you create your own concept rule it is not case sensitive? The following example concept rule identifies several types of desserts.

 

01_saspchChef-150x150.png

 

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

It includes some capitalized words, but it will generate matches regardless of the actual case in the documents. There are four matches returned with the default rule behavior.

 

02_saspchRuleA.png

 

Suppose now you want to identify terms based on the specific case that is used in a rule.  With the Concepts Node open, right click the rule name that you want to be case sensitive and click “Match Case”. This will place an icon Aa next to the rule name as shown below. This easily distinguishes case-sensitive rules. Notice that now there are fewer matches in the text document with case sensitivity enabled for the rule (only two matches this time).

 

03_saspchRuleC.png

 

Using ActionSets

 

The next example from the product documentation creates three concept rules programmatically that work together as a unit.

 

The first section is the syntax for creating the three rules: COMPANY, PRODUCT, and TEST_PRED using data step. The first two Classifier rules are used in the third rule which is a PREDICATE_RULE type, also referred to as a “fact”. Notice the property “CASE_INSENSITIVE_MATCH” in the syntax reflecting the default behavior of concept rules not being case sensitive. You can remove this line if you want to make the rule case sensitive.

 

Each of the three rule blocks start with ENABLE:

 

options casport=5570 cashost="cloud.example.com";   
cas casauto;
libname mycas cas; 

data mycas.concept_rules_long;                      
   length config varchar(*);
   infile datalines delimiter='|' missover;
   input config$;
   ruleid=monotonic();
   datalines;

      ENABLE:COMPANY
      FULLPATH:COMPANY:Top/COMPANY
      PRIORITY:COMPANY:10
      CASE_INSENSITIVE_MATCH:COMPANY
      CLASSIFIER:COMPANY: Microsoft
      CLASSIFIER:COMPANY: Amazon
      CLASSIFIER:COMPANY: Google

      ENABLE:PRODUCT
      FULLPATH:PRODUCT:Top/PRODUCT
      PRIORITY:PRODUCT:10
      CASE_INSENSITIVE_MATCH:PRODUCT
      CLASSIFIER:PRODUCT: xbox
      CLASSIFIER:PRODUCT: windows
      CLASSIFIER:PRODUCT: chrome
      CLASSIFIER:PRODUCT: fire

      ENABLE:TEST_PRED
      FULLPATH:TEST_PRED:Top/TEST_PRED
      PRIORITY:TEST_PRED:10
      CASE_INSENSITIVE_MATCH:TEST_PRED
      PREDICATE_RULE:TEST_PRED(company, product):(DIST_1, "_company{COMPANY}","_product{PRODUCT}")
      ENABLE:Top
      FULLPATH:Top:Top
      PRIORITY:Top:10
      CASE_INSENSITIVE_MATCH:Top
   ;
run;

 

These rules must be compiled before they can be used in the ActionSets. The syntax is shown below. The section starting with textRuleScore.applyConcept about halfway down the following example, takes the compiled rules and extracts the matched documents, placing them in the output table. The documents were read from the table named apply_concept_text. This is useful for quickly applying concept rules to documents that need to be processed on an ad-hoc basis, or for processing large document collections in batch mode.

 

proc cas;                                              
   
   builtins.loadActionSet /                            
      actionSet="textRuleDevelop";
   
   builtins.loadActionSet /                           
      actionSet="textRuleScore";
                        
   textRuleDevelop.compileConcept /                    
      casOut={name="outli", replace=TRUE}
      ruleid="ruleid"
      config="config"                
      table={name="concept_rules_long"};
   run;

   textRuleScore.applyConcept /                       
      casOut={name="out_concept", replace=TRUE}
      docId="docid"
      factOut={name="out_fact", replace=TRUE}
      model={name="outli"}
      ruleMatchOut={name="out_rule_match", replace=TRUE}
      table={name="apply_concept_text"}
      text="text";
   run;
   
   table.fetch /                                      
      table={name="out_concept"};
   run;

   table.fetch /                                      
      table={name="out_fact"};
   run;

   table.fetch /                                       
      table={name="out_rule_match"};
   run;

quit;                                     

 

The terms from the scored documents that are matched by the three rules are listed in the table below. The start and end positions of the matched text are in the table. The _fact_argument_ column refers to the individual supporting rules that found a match which was then used in the main TEST_PRED predicate rule type.

 

04_saspchMatches.png

 

For more details on how to work with concept rules, be sure to sign up for our e-learning or live-web classes on SAS Visual Text Analytics. The previous documentation link above provides lots more information that may be spotlighted in an upcoming post.

 

Thanks for reading and keep texting!

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎01-04-2024 08:46 AM
Updated by:
Contributors

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags