Deploy analytics using computer vision, model training and streaming data: image recognition example

14 Likes

Ok, today you’re a SAS user. But, imagine that you were also the owner of a retail store! Wouldn't it be great to know what percentage of customers in your store looked happy as they browsed your shelves, or perhaps which underage customers are trying to buy cigarettes? Applications of machine learning, computer vision (CV) and streaming analytics abound, covering use cases from enhancing real time in-store customer experience, to grid surveillance and security, predictive maintenance, claims fraud management and improved manufacturing quality.

I recently met with an executive from a security camera company and our discussion centered around how easy it is to stream video data into the SAS platform. This got me thinking, once the data is in the SAS platform, what can we do with it? After some thought, I decided that gathering and displaying demographic data would be a perfect use case, and that SAS Analytics for IoT was the perfect solution to solve this or any computer vision problem. This comprehensive AI-embedded solution provides so many capabilities because it integrates a number of key SAS products (see the figure below).

Gathering and displaying demographic data: a retail example

Using two products that are embedded in the SAS Analytics for IoT solution, SAS Visual Data Mining and Machine Learning (VDMML) and SAS Event Stream Processing (ESP), I’m going to show you how to train and deploy various demographic analytic models to create a single solution that connects directly to a USB-connected security camera. With this connection, I can process incoming images using pre-trained analytical models and produce demographic statistics. This application provides:

Face Detection - Process incoming images and detect human faces in the video stream
Age Classification - Determine approximate age grouped by category
Gender Classification - Determine male or female
Emotion Classification - Group by these classes:
Happy, Neutral, Sad, Fear, Surprise, Angry

An example sample user interface might look like this:

On the left, I’m showing the demographics as it relates to the current face detected in the security camera video feed in real time. On the right, I have an example of the total gender statics for that day. Now let’s talk about the steps you’ll need to build this type of solution yourself.

First off, let's talk computer vision

Computer vision (CV) techniques provide the ability to acquire, process and analyze incoming images. This analysis produces numerical results in the form of predictions based on the classes we define. In this example, we need to create four separate CV models. First, we need an object detection model which will not only give us the probability there is a human face in the incoming image, but it will also give us the coordinates of that face in the image, in the form of a bounding box. Using the box coordinates we can then crop just the face from the incoming image and send that data to the next analytical model. Consider this example:

Here I’m using a YOLO (You Only Look Once) object detection model to find all the human faces in the incoming video stream.

Next, we train our models

Before I can build an application that uses analytical models to predict outcomes, we need to train them. The training process for CV involves classifying images and separating these images into datasets that can then be fed into a machine learning model such as ResNet50, VGG16, Darknet etc. This stage of the process is completed using the SAS Deep Learning Python (DLPy) package which provides the high-level Python APIs to the deep learning methods in SAS Visual Data Mining and Machine Learning (VDMML).

As the previous diagram illustrates, four separate datasets were created to support model training. In cases where a dataset supports many types of image classes, Multitask Learning may be applied. Here, I tweaked each dataset to get the best possible output which led to keep the training separated. Therefore, each image dataset supports the training of one model. Face detection was trained using a Yolo V2 architecture while age, gender and emotions were trained using a ResNet50 architecture. For example, when training the gender model, image data is loaded into SAS Viya and VDMML using DLPy. Deep learning algorithms are then invoked as each image is processed to create a portable analytics file called an ASTORE file. VDMML is GPU-enabled so that training times are greatly improved. A typical training exercise contains these steps:

Setup libraries and launch CAS
Load and explore the training data
Prepare the data for modeling
Specify the model architecture, configure model parameters and import pre-trained weights
Fit the image detection and classification model
Evaluate the newly created image classification model
Visualize model results
Save model as ASTORE for deployment

Please see the "Sources and more information" section for an example of the gender training model.

Now use streaming analytics

Streaming analytics is defined as the ability to constantly calculate statistical analytics on an incoming stream of data. In our case, that stream of data is the images coming from the camera. SAS Event Stream Processing (ESP), which is part of the SAS Analytics for IoT solution, provides the ability to deploy our newly trained analytical models, in the form of ASTORE files, at the edge. With ESP you can ingest, filter and transform your data in stream. I like to think of it as enabling business rules to gain extra business value. For example, let's say your company wanted to track how many happy women between the ages of 10 and 30 walked by the frozen food section on Tuesday when you were running a sale, verses a normal day when the sale is not running. ESP gives you the capability to make that happen. There is a saying here at SAS, "Without deployment, analytics is only a science experiment."

This diagram illustrates an overview of this project deployment architecture.

Here we can see the flow of information through the system and highlight some key points:

ESP is built for speed. Although, there are many methods of ingesting data into ESP (REST, MQTT, MQ), to make this superfast I used a UVC connector which allows me to directly connect ESP to the incoming video stream from the camera. I also took advantage of ESP's multithreaded capability by scoring age, gender and emotion simultaneously each in its own thread.
ESP integrates with open source. You can easily call python scripts in stream from the ESP model flow. This allows further integration with other open source packages such as OpenCV. Using Python and OpenCV, the images were cropped, resized, reshaped and colors were manipulated. Anything is possible.
Retention is amazing. Retention defines a group of events over a time or event count. Instead of doing analytics on each image individually, you can now take a group of images and create new event data – like the number of men over the last hour or total number of kids today. It is very powerful.
ESP includes a powerful graphical development platform. Using ESP Studio models such as these may be created without any coding. For example, publishing my results to MQTT is as easy as dragging a window onto the canvas and filling out an MQTT specifications form.

Let's see analytics in action

Now that we have the background, let's take a look at this application in action. In this demo, you'll see how streaming image data is transformed or enhanced in real-time to deliver demographic data.

(view in My Videos)

As you can see, SAS Analytics for IoT provides you with all the tools you’ll need to quickly go from concept to production. Although this example consumed image data, you can use any type of data. This comprehensive solution also provides tools to maintain and govern model development, as well as everything you need to visualize and cleanse new data sources. Let’s see where your imagination takes you! I’d love to hear how you put SAS Analytics for IoT to work for your company.

Download the files and try it yourself!

>> Resources on GitHub <<

Sources and more information

turniy · ‎02-23-2020

Great article! Thanks for sharing this.

CharliBedori174 · ‎03-30-2024

Streaming analytics is defined as the ability to constantly calculate statistical analytics on an incoming stream of data.

CharliBedori174 · ‎04-05-2024

Deploying analytics using computer vision, model training, and streaming data, specifically through an image recognition example, represents a monumental leap in how we understand and interact with the digital world. The integration of these technologies enables systems to not only see but also interpret and learn from visual inputs in real-time. This approach can revolutionize various sectors, including security, healthcare, and retail, offering unprecedented accuracy and efficiency. For instance, in the abacus market onion, utilizing these advanced analytics could significantly enhance user experience and security, making it a cornerstone for innovative solutions. The potential for growth and improvement in data analysis and application is vast, showcasing the transformative power of blending cutting-edge technologies.