Credit cards are a convenient tool to help you acquire goods and services without carrying lots of cash or writing checks. The lure of wanting something and getting it instantly without considering how to pay for it is also a reason that sooo many credit cards exist.
People can easily get into trouble and incur credit card debt at exorbitant interest rates posing a mountain of financial burden that can be extremely difficult to get out of. Would you put your head in the open mouth of a fierce lion? Well, many people have done so without thinking through the consequences as far as using credit is concerned. With credit card agreements often exceeding 15 pages in length, it's little wonder that they are not on top of your reading list.
Still, by looking at the fine print of credit card agreements before signing up for one, you can save money when you do have to carry credit charges over several billing periods.
For this post I downloaded hundreds of credit card agreements from dozens of financial institutions. The data is generally available from the Consumer Financial Protection Bureau.
The documents are PDF files, so to get them into SAS Viya I had to make a connection to the location of the downloaded main folder that had the individual folders and PDFs for each financial institution. There could be multiple credit card agreements for a financial institution. In this example there were over 500 credit card agreements processed. Once the connection to the folder was made in “Manage Data” in SAS Viya, I was able to import all the credit card agreements into one SAS data set with each agreement in its own row.
I created a Visual Text Analytics project in SAS Model Studio using the credit card agreement as the text field. To start, I ran a generic pipeline to see what kind of initial results I would get. I enabled the standard concepts check box for this first pipeline to tag nlp (Natural Language Processing) concepts in the documents. Browsing through the documents to get an idea of what to follow up on, I noticed that the documents had quite varying APR rates. I thought it might be interesting to see what APR rates were available for different types of credit cards. While browsing through the documents in the Concepts node, I noticed that there was a mix of banks and credit unions in the data. To limit unintentional bias in my testing, I wanted to see if I could separate the banks from credit unions in the project.
I ended up accumulating several custom concepts during my effort to try to fine tune the APR information.
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
I started by searching the documents for “APR” and “rate”. I took one of the predefined concepts “NLP percent” and wrote a concept_rule to match any percentages within six tokens of the terms "APR" or "rate". I then noticed that there were percentages in the documents that related to the terms "margin" and "accrue". These terms also impact interest rate charges, so I ended up creating the following concept that I called _RATES_. The portion of the rule after the _c is the value returned upon a rule match, giving me a list of the desired percentages in each document.
_RATES_
CONCEPT_RULE:(DIST_6,"_c{nlpPercent}",(OR,"APR@", "rate@"))
CONCEPT_RULE:(DIST_6,"_c{nlpPercent}",(OR,"margin@", "accrue@"))
I could use the score code from the Concepts node to create a table of matching rates for each document that represents a credit card agreement. This output table could then be used to create a custom report of the frequency or ranges of rates in the document collection in a post processing step.
Here is a word cloud of keywords from the concepts node output table created in SAS Visual Analytics.
Keywords
I thought of a way to separate the credit union documents from the bank documents in Visual Text Analytics without having to pre-process the documents.
To do this I started with the basic classifier rule type to identify documents with the string APR. I then created an empty _CREDIT_UNION_DOCS_ rule to use as a landing place. By cleverly using the export modifier with the string “credit union” (all in square brackets), any document matching APR that also had credit union would magically appear in the previously empty _CREDIT_UNION_DOCS_ rule!
_CREDITUNION_
CLASSIFIER:[export=_CREDIT_UNION_DOCS_: credit union]:APR
The results show 72 matched documents below. Notice this rule doesn’t use any code since its purpose is to only receive the documents from the _CREDITUNION_ rule.
Having this information in a custom concept gives me the ability to do further evaluation for other rule matches for this subset of financial institutions.
In my next rule I wanted to figure out what the APR rates were just for credit unions. Using the concept_rule type, I was able to combine the previous _CREDIT_UNION_DOCS_ “landing pad” rule with the _RATES_ rule to obtain the APR rates for credit unions only. Using the Boolean “and” operator I was able to reference the other two rule types and using the _C operator I was able to list the credit union docs and identify those within this rule.
_RATES_FOR_CUDOCS_
CONCEPT_RULE:(AND,"_c{_CREDIT_UNION_DOCS_}","_RATES_")
Of all 72 credit union documents it turns out that 38 of them matched the rate percentages generated by the _RATES_ rule. Instead of selecting output from the _CREDIT_UNION_DOCS_ rule to be highlighted (which matched the string "credit union"), I could have written the rule to highlight the actual rates by putting the _RATES_ rule portion first after the _C operator. This would, perhaps, provided more useful results in my rule match.
In the next example I will reconstruct the rule to see what the results look like. I would expect to still have 38 matches.
This is what the rule looks like after changing the context being returned.
CONCEPT_RULE:(AND,"_c{_RATES_}","_CREDIT_UNION_DOCS_")
I like these results better than the previous rule, although some of the APR rates seem to be surprisingly high.
This graph from the results of the Concepts node shows the number of documents matched per concept:
The additional _BINDING_ rule below can be used as a starting point to further explore this document collection on your own. Remember that you can use the concept rule output table in SAS Visual Analytics to create custom reports or score new documents using the rules you build in this node.
I added this final rule to identify language reflecting liability for charges and found some interesting information. For this example, I wanted to capture a larger section of text within a match.
First, the PREDICATE_RULE syntax in the first line below selects all text between matched terms, so if ‘binding’ and ‘heirs’ appear anywhere in the same document, all text between the terms is highlighted. This entire phrase will be selected if you score new documents using this rule.
Secondly, the CONCEPT_RULE syntax below identifies the term ‘notice’ if it falls within 6 tokens of variations of the word demand (demands, demanded, demanding) or the word pay (payer, paid, pays).
_BINDING_
PREDICATE_RULE:(start, end):(AND, "_start{binding}", "_end{heirs}")
CONCEPT_RULE:(DIST_6, "_c{notice}", (OR, "demand@", "pay@"))
Some matched text from this rule may reflect surprising and unexpected contract terms, pointing out the importance of reading and understanding agreements before signing.
Technology is changing the way society operates. How many times have you just clicked 'accept’ and continued on to whatever app you are adding. Hopefully this post will encourage you to be a little more curious about the ‘fine print’ you are actually agreeing to, and even ‘opt out’ of agreements you find disagreeable. I hope this post encourages you to experiment with and gain insight into the powerful capabilities of concept rules. Who knows, it may limit some unnecessary hardships down the road. This technique can be applied to all kinds of documents.
Thanks for reading and notice that you didn’t have to accept anything to do so. 😊
Find more articles from SAS Global Enablement and Learning here.
... View more