BookmarkSubscribeRSS Feed

If somethin’s broke, I can fix it. If it ain’t broke, I’ll make it better.

Started ‎08-26-2024 by
Modified ‎08-26-2024 by
Views 186

Imagine that you have an enormous collection of transcribed phone calls from bank customers as your data source. If you were to ask someone to identify calls related to ‘accounts’ in a bank, what do you think they would say?

 

Sure, some may simply tell you they’re not interested in whatever you’re selling. Others may indulge your question and provide a thoughtful response possibly suggesting that you:

 

Look for savings, checking, account balance, and interest rates.

Search for overdraft, insufficient funds, closed account.

Don’t forget the terms: fraud, check, or any word containing ‘cash’.

 

With a bit of work, we can come up with convincing search arguments that seem to do a pretty good job of identifying the concept we are looking for. Even with well-crafted arguments, it is possible that some relevant documents remain unidentified.

 

Is there anything else that can improve our information retrieval searches when we have run out of ideas based on typical human thinking patterns?

 

01_PC_saspch081.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Refer to my previous post where I show how to use SAS Visual Text Analytics to automatically generate concept rules that can build on our research related to ‘accounts’.

 

In this post, I want to highlight some less-obvious available syntax for constructing various types of concept rules. This screen capture from my previous post shows a handful of generated rules. Think of these as a source of concept rule ideas you can experiment with and use in rules that you have already written.

 

02_PC_saspchj8.png

 

Let’s consider how we can use these for inspiration to create specialized rules that can be useful in many different scenarios.


What do you think of this rule for starters? We can use it to determine what actions have been taken regarding user accounts.

 

C_CONCEPT:_c{:V} my account

 

The C_CONCEPT rule returns the _c portion of the rule when a complete rule match is found in a document. This rule looks for a part of speech :V (verb) immediately followed by the term ‘my account’. This rule returns any actions applied to a user’s account. The action will likely be a verb such as opened or closed, but we might discover some unusual or unexpected actions that provide new insight into our customer data.

 

This information can then be used to identify trends in the number of specific actions that have been taken against user accounts over time. (A SAS Visual Analytics dashboard using the scored concepts is a good way to report on this information).

 

Different parts of speech can be substituted in this type of rule depending on the information you are looking for. Nouns, verbs, adjectives, prepositions and determiners are examples of parts of speech that can be identified by the Natural Language processing capabilities in SAS Visual Text Analytics. This table lists the available parts of speech for all supported languages.


Here are a few examples of additional automatically generated rules to ponder.

 

C_CONCEPT:_c{:N} from :PN

 

This rule highlights a noun :N followed by ‘from’ followed by a proper noun :PN (upper case noun). This rule can identify interactions between parties that may be of interest in a business context. Here are some matches for this rule from my data collection of customer complaints.

 

Matches:

 

retroactive spousal payments from Department of Veterans Affairs

 

contacted so many attorneys from New Jersey to Georgia to pursue a case

 

After hearing from IRS


The next example looks for the term fraud in the same sentence as any monetary amount found by the predefined nlpMoney concept. The elements in this rule can appear in any order as long as they are in the same sentence. Combining predefined concepts with specific information you are looking for can be very powerful.

 

CONCEPT_RULE:(SENT, "_c{Fraud}", "nlpMoney")

 

Here are three returned matches from this rule.

 

Bank 's Fraud Department regarding an online purchase of over {$1500.00} in which I stated…

 

…was victim on fraud in the amount of {$18000.00} on XX/XX/2022…

 

Overall, I lost over {$13000.00} to this fraud and I really want my money back.


Next is a rule that requires items to appear in a specific order for a match to occur. It looks for any word form of either fraud or scam within 5 terms of a proper noun.

 

CONCEPT_RULE:(ORDDIST_5 , (OR, "fraud@", "scam@"),"_c{:PN}")

 

Matches:

 

I was scammed. The FBI is now involved..

I filed a fraud report with local XXXX Sherrif 's police department…


Here is a generated rule looking for items that occur in any order in a sentence. This rule combines a predefined concept with a morphological expansion (the @ symbol syntax) of the term advance. (any form of the term 'advance')

 

CONCEPT_RULE:(SENT, "_c{advance@}", "nlpMeasure")

 

Matches:

 

It was not until two days later ( XX/XX/18 ) that I found out that they transferred money from my own credit card as a cash advance to make me believe it was money from somewhere else.

 

mailing in my payments 1 month in advance

 

We will send you an advance notice approximately seven days before we reapply the negative balance.

 

There were 75 cash advances


If you are looking for any mention of an action, this rule identifies what’s going on in your documents by looking for a verb before ‘that’ followed by a noun. Here are some examples.

 

C_CONCEPT::V that _c{:N}

 

Matches:

 

lose that amount of money.

 

stop that retaliation

 

I sent that money

 

protect that data or delete that data

 

 

I would not have thought of these rules were it not for the automatic generation capability. Consider incorporating some of these ideas in your concept rules to see if it improves your information retrieval success. Who knows, if your current rule ain't broke, you may make it better!

 

Thanks for reading and I wish you much success with your text analysis adventures.

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎08-26-2024 03:09 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags