About TeresaJade

TeresaJade · ‎06-01-2023

Hi Giorgio, I would be happy to help you with your question. There are a couple of different directions you can go in depending on your goals and the version of the software you are using. First, although you mention category rules, it looks from you example that you are using concept (information extraction) rules. It is possible to feed concept rules up into category models if you want to. I won't go into that here, but feel free to ask more questions if you want to explore that option. I also want to mention the book that was published by SAS press on writing concept rules. If you are planning to use this approach extensively for models, then that book will be a great tool for you. It goes beyond the basic documentation and answers lots of questions and contains many types of examples. It is on Amazon here: https://www.amazon.com/SAS-Text-Analytics-Business-Applications-ebook/dp/B07QC3S58F/ref=sr_1_1?keywords=Teresa+Jade&qid=1555095723&s=books&sr=1-1. you can also talk with you SAS sales representative about ways to acquire a copy. It is also available on O'Reilly if you have a membership. Since you are using a PREDICATE_RULE rule type in your example, I am going to assume that you want to extract the text from both your concepts into a singe fact match rather than making a CONCEPT_RULE that would extract one or the other of the arguments using the _c{} modifier. However, you could use my advice here to do that as well. Your base question was how to generalize your ";" to other sorts of punctuation. The answer I will provide is also illustrated in section 7.5 of the book mentioned above. You can use a reference to a concept after your UNLESS operator as long as the concept contains only REGEX or CLASSIFIER rules. So you can make a concept called something like blockingPunctuation with rules like this: CLASSIFIER:: CLASSIFIER:- CLASSIFIER:; or if you have any problems matching a punctuation marker here, you can use a REGEX rule instead. It is generally a good practice to put REGEX rules in their own concept, so if you use this approach, you might want to use two separate PREDICATE_RULES. Here is your basic rule and an optional one if you need it: PREDICATE_RULE: (vehicles, type):(UNLESS, "blockingPunctuation", (SENT, "vehicles{VEHICLES}","_type{TYPES}")) PREDICATE_RULE: (vehicles, type):(UNLESS, "blockingPuncRegex", (SENT, "vehicles{VEHICLES}","_type{TYPES}")) Note: I changed two things about your original rule. One was just the spelling of your first argument; it was identical in both uses, so would have worked as you had it. The second was a use of _sport, which I changed to your declared argument "type", so it would work correctly. This is a syntax error and could have blocked your rule from working correctly. The above should work for you. There are a couple additional tips I can offer as well, in case they come in handy as you progress. 1) If you are using a recent version of Viya, there is a new operator available in LITI (concepts) that is called CLAUS_n. It may help you restrict your matches without needing to use the UNLESS operator in cases where the content you are looking for is within the same clause or same set of related clauses. It looks like your use case may match this functionality. 2) If you are trying to restrict your matches by clauses but are using an earlier version, you might need to add certain types of words to your list of punctuation like conjunctions: and, or. This would help avoid a match on a sentence like this: I like to play sport and I put many miles on my car to play matches every weekend. If you do this, you might want to rename you concept to something like clauseBoundary. 3) If you are really trying to just find modifiers of a noun that are in a specific order and not far from each other, you might want to use a ORDDIST_N operator inside your SENT operator to restrict your matches. For example, if you expect "sport" to modify "car", perhaps along with other modifiers, this may be the safest, easiest option. PREDICATE_RULE: (vehicles, type):(SENT, (ORDDIST_5, "vehicles{VEHICLES}","_type{TYPES}")) 4) Because matches to concept names that look like real words could confuse your matches, I recommend naming your concepts with camelcase combinations of words that will not appear in your text. In other words, naming a concept TYPES and then referencing TYPES in a rule will match both the TYPES concept and the strings in your text TYPES. It is better to name your concept vehicleType to match each type of vehicle such as car, truck, lorry, etc.

TeresaJade · ‎11-11-2022

Hi @PharmlyDoc, Yes, your solution will work to redact person names from text. Three possible options suggested by my colleagues, when I asked them about your use case, were: 1) You can use the output of applyConcepts with predefined = true instead of proc textMine, if you want to leverage the identification of the text offsets (position of the pieces of text you are targeting). This will help avoid possible conflicts, if a name might also be similar to a non-name in your data - for example Martin Luther King vs. Martin Luther King Highway. This approach will pinpoint the names in the text accurately and redact only those items vs. getting confused with things like addresses. 2) If you find your code is not as efficient as you would like, you could try using the terms (and offsets) as a hash table within a data step. 3) If you want to add lowcase to your text line, it will ignore casing on the comparison: text = tranwrd(lowcase(text),lowcase(strip(names[i])),'[NAME REDACTED]'); 4) This is a great example of text redaction, and it could be made into a macro to redact other types of PII information such as social security numbers as '###-##-####'. Let us know how it goes!

TeresaJade · ‎11-01-2022

Thank you for your question! Yes, it appears that with the strategy you are using, you will get the results you are looking for only by separating your NFL list into the different sections you are interested in (probably by team based on my understanding of your use case). Note: even though the product you are using is discontinued, the LITI syntax is alive and well at SAS. We use it for information extraction (concept and fact extraction) and it still integrates with the categorization syntax. There is a book on the LITI syntax that may be useful to you published by SAS Press. Here is a link on Amazon (it is also available on O'Reilly and through your SAS representative): https://www.amazon.com/SAS-Text-Analytics-Business-Applications-ebook/dp/B07QC3S58F/ref=sr_1_1?keywords=Teresa+Jade&qid=1555095723&s=books&sr=1-1

TeresaJade · ‎02-11-2022

Hi! So glad you asked about the predefined concepts! Let me start by saying that no NLP approach or model can work 100% especially across all types of data. However, when you use predefined concepts, they are intended as a good starting point for your analysis. The LITI language means that you can expand and grow that model to your needs. You can add rules that result in more matches by using CLASSIFIER, CONCEPT, or C_CONCEPT, or CONCEPT_RULE type rules in the model. You can place them directly into the nlpOrganization concept, if you like, for expanding this concept. Or you can create your own myOrganization concept, and add your rules there under a rule CONCEPT:nlpOrganization. You can also remove matches from the OOTB model by using the REMOVE_ITEM rule type. These rules are a bit harder to use, but a starter rule is here: REMOVE_ITEM:(ALIGNED, "_c{nlpOrganization)", "bad match") This rule says to remove the nlpOrganization match specified after the comma. Be sure that if you copy/past the rule, the quotes are not "smart quotes". You can get more tips and tricks for using LITI rules and working with predefined concepts in the SAS Press book: https://support.sas.com/en/books/authors/teresa-jade.html. You can discuss ways to get this book with your SAS representative.

Online Status	Offline
Date Last Visited	‎05-20-2024 09:08 PM

Re: SAS Visual Text Analytics: category rules on part of sentence

Re: Use proc textmine to remove terms in my dataset where role=nlpPers...

Re: Content Categorization Studio/Server, Category and Classifier Conc...

Re: Custom Concept in Visual Text Analytics

The LITI rule for Text Analytics that you didn’t know you needed…until...

Normalize your vectors!

Content Categorization Studio/Server, Category and Classifier Concept ...

Re: Viya Text Analytics - Using Text Parsing node result for concept

Re: Client API Documentation for Content Categorization Server?

Re: SAS Visual Text Analytics: category rules on part of sentence

Re: SAS Visual Text Analytics: category rules on part of sentence

Re: Use proc textmine to remove terms in my dataset where role=nlpPers...

Re: Content Categorization Studio/Server, Category and Classifier Conc...

Re: Custom Concept in Visual Text Analytics

SAS Viya Copilot Private Preview