How can we use Network Analytics for Application Fraud Detection

3 Likes

Network visualization has become a staple for fraud detection and investigation, but there is a lot more we can obtain from the complex graph structure that supports it. In this post we will go through the basics of network analytics to understand how we can use it effectively and explore examples of fraud typologies we can potentially detect using it, thus improving detection rates.

What is Network Analytics?

In mathematics, networks are structures used to analyse the relations between entities, that can be visualized as nodes and the relationship between then as links.

Network Analytics therefore is the use of statistical measures to obtain information contained in these complex graphs full of nodes and links.

Network Analytics Methods

There are multiple calculations and aggregations we can do based on network information. Just counting nodes and links or filtering it by specific attributes can bring a lot of value for detection and investigation, for example, just checking the amount of previous fraud in a network can reveal a potential new fraud linked to it.

Once we have discovered how enriched our data can be, thanks to these basic measures, we will be amazed by more complex methods such as the ones covered below:

Centrality Measures

Centrality measures are statistics meant to evaluate importance of nodes and links, and there are multiple of them since importance can mean different things depending on our use case or considerations.

These measures can be automatized and be used during both detection phase and investigation and can also be very useful to enhance scoring and prioritization. They can be especially helpful when understanding very complex and big networks.

Some examples are:

Betweenness Centrality: calculated using the shortest paths between the nodes in a network. It gives importance to the nodes that are being used as bridges between other nodes. In claims fraud, it can discover nodes that are connected to other claims by suspicious entities (ex: phone number, email, etc.)

Degree Centrality: it’s the number of links that a node has. We can detect anomalies of claims that have too many links or too little compared to what is expected (that can depend on the network generation and what we consider usual behavior).

PageRank: algorithm used by google search. Has a more complex (and truthful) definition of importance. A node is important if it has many inbound links (like other measures) but also if it’s linked from other very important nodes. Thanks to PageRank you can discover critical claims to dismantle fraudulent groups and it can improve the triage of alerts.

Pattern Matching

Involves finding one or more sub graphs, from a larger network, that matchs a specific structure or characteristic. That definition we want to find is known beforehand, working similarly to rules in the detection engine, with the advantage of complexity that you can reach with network data.

Pattern Matching can be part of the automated alert generation process, and just like rules and supervised learning models, it uses investigators learned insights to improve detection. One big advantage when compared to other types of analytics, is how interpretable and easy to explain it is, both for investigators and regulators. Despite this, it can detect very complex fraud schemes.

Some patterns that can be relevant for claims fraud:

Overlinked claims: claims that share too much information (in the example 3 personal information such as email, address, and IP when the name and surname are not the same) with a supposedly unrelated claim.

Roles reversed: A guarantor that is also a guarantee in a different claim, which has a guarantor that is also a guarantee in other claim. You can concatenate this concept as much as needed.

Cycle Detection

A cycle is a sequence of nodes that end up connecting to the starting point or origin. The cycle detection algorithm is able to find in our network this redundant patterns. As with pattern detection, this can be part of our alert generating system, it’s easy to detect, understandable and can be easily combined with other methods (network-wise or not) to create complex scenarios. In the fraud context, cycles can be very suspicious when dealing with monetary transactions (money that comes from one source, goes to many other accounts and at some point, returning to its origin in the end) but also in applications that are connected in cycle, that can appear in organized fraud schemes.

Shortest Path

The shortest path is one of the most well-known and used amongst all industries. It is used in GPS devices to find the shortest path between two different locations, but it has broad applications also in fraud. Given a network, we can calculate the shortest path between one node and all the other nodes in that network, and this could help us to understand the risk of a transaction, that at some point got connected to a past known fraud. For example, if the shortest path between a transaction and a fraud is very short, we have a very high risk, but if the shortest path is a very long one, we might reconsider and lower the risk score. One of the key benefits of this method is having low computational effort when compared with previous examples, it is also valuable during the investigation process, can be used to enhance the scoring and prioritization and it’s easy to configure and implement.

Others

Other powerful methods using core composition can be used to detect unions between many unrelated fraudulent claims, or articulation points to detect the structure of organized criminals.

Connected components is other key method for simulation of networks in the past, and community detection can segment networks and improve the efficiency of the visualization and further investigation.

There are multiple other ways to benefit from network analytics not mentioned in this post, yet to be explored and tested on how it can improve the performance fraud detection systems.

Benefits of using Network Analytics

Network visualization is already a well-established part of fraud detection and adds essential capabilities for the investigation phase.

On top of that, we have just discovered how to include a next step: network analytics. Again, the investigation will benefit from the powerful insights that can be calculated with them, but more important, we can use it to discover new fraud patterns, generate alerts and improve rules.

Networks have proven to be indispensable when discovering complex fraud schemas, suspicious behaviors that involve the multiple entities and their connections, providing great results when used to enhance risk scoring on detection system and therefore improving prioritization of alerts.

Are you ready to start using these methods?

colingray83 · ‎12-07-2023

Hi,

Great article Celia. I'd also add in another feature of networks and that of repeated events. Probably best to explain by an example. Let's say we have 5 linked motor insurance claims on a network and all 5 occur at night. From our past records only 25% of claims occur at night. If we assume independence of events then the probability of all 5 claims being at night = (0.25)^5 = .001 (or around 1 in 1000). You can use the binomial distribution for 0,1,2,3 &4 claims at night.

The independence assumption is quite strong and probably OK in this case [though you may be able to think of exceptions e.g. maybe all night time shift staff etc.], in other cases may not work as well.

Good luck with your analysis.

Colin