Team Name | The Dark Knigh Of DC |
Track | Banking & Insurance |
Use Case | Anti-fraud applications in the graph data |
Technology | Maching Learning;Graph Algorithm;Graph Embedding;Neural Network etc... |
Region | APAC |
Team lead | @xupw |
Team members | @Liujql @hailunwang |
Here is more detalis for our using data:
The original dataset is extracted from Aminer(https://www.aminer.cn/), using articles as nodes, their titles' word embeddings as node features, and the citation relationships as adjacency. The dataset contains 659,574 nodes and 2,878,577 links, with each node equipped with a 100-dimensional node feature. The labels of the first 609,574 nodes (indexed 0..609,574) are released for training, while the labels of the rest 50,000 nodes are for evaluation.
We modify the application background of these data. Assuming that these data are obtained from the bank, these nodes are assumed to be bank accounts, and the personal information of the accounts (including age, region, income, AUM, etc.) are used as node features, node labels are assumed to be the labels of the customer. Use these public data to simulate actual financial scenarios.
Each article in the AMiner data set serves as a node, and there are 18 categories of nodes, representing the research field of each article. We build GCN and GAT models to perform node classification tasks on the nodes in the above data. In this multi-category node classification task, the training efficiency of GCN is significantly higher than that of GAT. In this 18-category classification task, the GCN algorithm can achieve a classification accuracy of 43.76% after 500 trainings (maybe because an article belongs to multiple research fields, resulting in a lower final prediction accuracy).
In the actual anti-fraud scenario, we can transplant the method and process of this modeling, take the customer as each node of our graph data, and set the label of the node as fraudulent customer and normal customer according to their personal fraud attributes, through GCN Algorithms such as graph neural network such as, GAT, etc., classify each node. Since the anti-fraud model task is a binary classification task and the uniqueness of fraudulent customers and normal customers, the accuracy of the final anti-fraud model will be greatly improved compared with the existing model results. In fact, the graph neural network has begun to be used in various fields of anti-fraud, including application anti-fraud, transaction anti-fraud, anti-money laundering, and financial risk control.
can you share video of what you did in the SAS system as part of the hackathon? your videos above introduce the concept but am not able to see how this was realised in SAS or Open source models.
Due to the rapid development of Graph data and representation, many new techniques and applications of GNN are proposed. We describe the main GCN and GAT methods and codes in the presentation video, in fact, we use SAS modules in the code.
Love the creativity in your videos!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!