Network visualization has become a staple for fraud detection and investigation, but there is a lot more we can obtain from the complex graph structure that supports it. In this post we will go through the basics of network analytics to understand how we can use it effectively and explore examples of fraud typologies we can potentially detect using it, thus improving detection rates.
In mathematics, networks are structures used to analyse the relations between entities, that can be visualized as nodes and the relationship between then as links.
Network Analytics therefore is the use of statistical measures to obtain information contained in these complex graphs full of nodes and links.
There are multiple calculations and aggregations we can do based on network information. Just counting nodes and links or filtering it by specific attributes can bring a lot of value for detection and investigation, for example, just checking the amount of previous fraud in a network can reveal a potential new fraud linked to it.
Once we have discovered how enriched our data can be, thanks to these basic measures, we will be amazed by more complex methods such as the ones covered below:
Centrality measures are statistics meant to evaluate importance of nodes and links, and there are multiple of them since importance can mean different things depending on our use case or considerations.
These measures can be automatized and be used during both detection phase and investigation and can also be very useful to enhance scoring and prioritization. They can be especially helpful when understanding very complex and big networks.
Some examples are:
Involves finding one or more sub graphs, from a larger network, that matchs a specific structure or characteristic. That definition we want to find is known beforehand, working similarly to rules in the detection engine, with the advantage of complexity that you can reach with network data.
Pattern Matching can be part of the automated alert generation process, and just like rules and supervised learning models, it uses investigators learned insights to improve detection. One big advantage when compared to other types of analytics, is how interpretable and easy to explain it is, both for investigators and regulators. Despite this, it can detect very complex fraud schemes.
Some patterns that can be relevant for claims fraud:
A cycle is a sequence of nodes that end up connecting to the starting point or origin. The cycle detection algorithm is able to find in our network this redundant patterns. As with pattern detection, this can be part of our alert generating system, it’s easy to detect, understandable and can be easily combined with other methods (network-wise or not) to create complex scenarios. In the fraud context, cycles can be very suspicious when dealing with monetary transactions (money that comes from one source, goes to many other accounts and at some point, returning to its origin in the end) but also in applications that are connected in cycle, that can appear in organized fraud schemes.
The shortest path is one of the most well-known and used amongst all industries. It is used in GPS devices to find the shortest path between two different locations, but it has broad applications also in fraud. Given a network, we can calculate the shortest path between one node and all the other nodes in that network, and this could help us to understand the risk of a transaction, that at some point got connected to a past known fraud. For example, if the shortest path between a transaction and a fraud is very short, we have a very high risk, but if the shortest path is a very long one, we might reconsider and lower the risk score. One of the key benefits of this method is having low computational effort when compared with previous examples, it is also valuable during the investigation process, can be used to enhance the scoring and prioritization and it’s easy to configure and implement.
Other powerful methods using core composition can be used to detect unions between many unrelated fraudulent claims, or articulation points to detect the structure of organized criminals.
Connected components is other key method for simulation of networks in the past, and community detection can segment networks and improve the efficiency of the visualization and further investigation.
There are multiple other ways to benefit from network analytics not mentioned in this post, yet to be explored and tested on how it can improve the performance fraud detection systems.
Network visualization is already a well-established part of fraud detection and adds essential capabilities for the investigation phase.
On top of that, we have just discovered how to include a next step: network analytics. Again, the investigation will benefit from the powerful insights that can be calculated with them, but more important, we can use it to discover new fraud patterns, generate alerts and improve rules.
Networks have proven to be indispensable when discovering complex fraud schemas, suspicious behaviors that involve the multiple entities and their connections, providing great results when used to enhance risk scoring on detection system and therefore improving prioritization of alerts.
Are you ready to start using these methods?
Hi,
Great article Celia. I'd also add in another feature of networks and that of repeated events. Probably best to explain by an example. Let's say we have 5 linked motor insurance claims on a network and all 5 occur at night. From our past records only 25% of claims occur at night. If we assume independence of events then the probability of all 5 claims being at night = (0.25)^5 = .001 (or around 1 in 1000). You can use the binomial distribution for 0,1,2,3 &4 claims at night.
The independence assumption is quite strong and probably OK in this case [though you may be able to think of exceptions e.g. maybe all night time shift staff etc.], in other cases may not work as well.
Good luck with your analysis.
Colin
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.