BookmarkSubscribeRSS Feed
mentos05
SAS Employee

Social media platforms such as Facebook, WhatsApp and WeChat are wonderful ways to communicate with people all over the world.
Unfortunately these platforms are also used by criminals which is the reason why crime fighters want to investigate the contents on these platforms.

While these platforms have stored a lot of metadata which is more or less available (especially for government organizations), the most interesting content is of course the one that is created by the users. This content usually consists of unstructured data such as text, speech, videos and images. In this article I am focusing on the latter, images, but the approach itself can be used for all kinds of unstructured data.

 

Our task: Enabling efficient ways of investigating image data

Now, imagine being part of an investigation group of the police that is responsible to fight weapon crimes, right-wing or terroristic organizations.
You have scraped data from various channels, e.g. Twitter, Facebook, WeChat & Co. and now you’re sitting on a huge pile of images not knowing where to start looking.
Manually investigating these data sources is impossible, due to their size and variety.

cat.jpeg

Our goal is to provide an efficient way of filtering this data, allowing investigation officers to focus on the relevant images and their sources. This not only saves manual effort but also enables officials to better protect the law.

 

The idea: Image classification to create additional metadata

The idea is simple. We will train an image classification model that is able to classify our images into various categories. So to speak, we generate additional metadata for our images.

This classification is then combined with other data we already have such as source, date, location, etc.

Finally we want to create an intuitive dashboard that enables investigation officers to quickly identify potentially interesting content and skip through all the nice and funny images of cats. 🙂

As soon as we have the trained model available, the workflow is the following:

  1. Metadata coming from new Images is directly fed into the analysis environment such as SAS Viya.
  2. Images are scored using our computer vision model
  3. Classifications and Metadata are combined

The combined data from step 3 is then fed into a visual report allowing crime investigators to perform in-depth analysis.

prcs.jpeg

Getting our hands dirty: Training the image classification model

I am using the deep learning capabilities of SAS Viya to develop the image classification model. Of course you can develop the model using the SAS programming language but I am more comfortable using Python and Jupyter Notebooks. Luckily there are powerful Python-APIs that allow you to interact with your SAS environment. I am using the following APIs

  1. SAS SWAT (communicate with SAS Viya)
  2. SAS DLPy (special deep learning API for SAS)

In this Jupyter Notebook you can see how I use transfer-learning to train a ResNet-50 image classification model that afterwards can detect different weapons. The model was pretrained on the ImageNet dataset which contains images from 1000 classes and can be downloaded here.

There is also a second example where I apply the same process to images from hate symbols such as ISIS flags or swastikas.

The data used for training comes from two different sources:

  1. Kaggle Weapon Dataset
  2. Hate symbols that I scraped from Google Image search by hand

Here is an example how my training data looks like for the weapon detection:

data.png

The Jupyter Notebook comes with a lot of comments that explain in-depth what I am doing. Therefore I am not going into the details here to keep the article as short as possible.

After training the model, I evaluated its performance using a confusion matrix. As you can see the predictions are quite accurate with an accuracy of 99%. This seems a little bit too high for me and might be due to the training data being really nice to us. A lot of images have a nice white background which you cannot expect from real social media images. 🙂

confmatr.png

Finished? No! A nice model is useless for our investigation officers.

A lot of articles about Computer Vision end when the model was trained and successfully scored some new data. However if you want your efforts to be useful you also have to think about how to put your models into the bigger picture. While SAS has advanced technology for this, e.g. the SAS Model Manager, I am not going into the details of production-ready applications here.

Instead, I’d like to demonstrate how a simple dashboard that uses our model’s predictions could look like. The dashboard was built with standard functionality of SAS Visual Analytics.

Example 1: Investigate Images of Weapons coming from Social Media

Example 2: Detect Hate Symbols on Images coming from Social Media

The only interesting part is the “Data-Driven-Content Object” that is responsible for displaying the selected images which you can see on the right side of the reports. This object dynamically receives the data selection from the other report elements. Behind the scenes it is using a very simple JavaScript code to retrieve and display the selected images.

As you can see in the videos our model acts as a supplier of additional metadata. It allows us to use the filter in the top-left corner to select the image categories we are interested in.

 

This use case does not only apply to crime investigation but is much more general.

Even though this article focused on identifying crime there are of course many other interesting use cases. Another idea could be that you want to identify your brand’s logo or products in social media posts. This way you could for example promote authentic user postings that highlight advantages of your products or you can try to give additional support if it is a customer complaining.

 

Do you have other use case ideas for this approach? I’d like to hear about them in the comments!

 

Michael Gorkow | Data Scientist @ SAS Germany & CV-Enthusiast

LinkedIn | GitHub | Medium.com 

Michael Gorkow | Data Scientist @ SAS Germany & CV-Enthusiast

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 769 views
  • 1 like
  • 1 in conversation