Part 2: Building Human Action Recognition Model Using Python and SAS CAS Actions

In our last series, we started off by analyzing human actions to identify and classify specific actions. In this post we will recreate the similar steps to achieve the same goal, which is to identify the human actions correctly and accurately from our dataset. For context of the post there will be time that names of the dataset will change, but once our image dimensions have been finalized our dataset names (test_out, train_out) will be consistent for the remainder of the post.

Introduction

The concepts of deep learning techniques are needed when working with (HAR) models. In the first series of this post “Building a Human Action Recognition Model” we discuss the concept of using these techniques, on how to view the data and how to access our data in our virtual environment. Remember, it is recommended that you have access to an image with a GPU processor or have the software capabilities to processing large volumes of image data simply to speed up the processing time.

Establish CAS Connection & Load Data

First, to enable a Python program, we must load the data into SAS Cloud Analytic Service (CAS) and connect to the server. For this post, I will have a direct connection using my credentials, but your process may be different. You must provide a few key parameters to establish your connection. You need to have a host name and port that allows the CAS controller to receive the Actionsets we’ll need for this demonstration.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Connect to CAS Server

To access our CAS actions and access Python we must first establish a connection to the SAS Cloud and utilize the packages needed for this post. To start by connecting to the SAS cloud, we must use SWAT packages. The SAS SWAT package is a Python package that allows you to connect to SAS Cloud Analytics Services (CAS). If a server is listening on the host name and port that are specified, and you authenticate, then the swat.CAS class makes a connection to the server, starts a session on the same hosts as the server, and returns the connection object.

Our data has been loaded to folder location "DL_Model" initially and we will load the images into memory. A caslib is an in-memory space to hold tables and access control list and information about our data. In the above figure, we have successfully established a connection to CAS library by creating the “mydl” connected to our folder path (/opt/userdata/DL_Model/).

Display Images

In this section, we display a 3 by 3 matrix of the different images with the label attached to the image to have a visual depiction of what the images will be display for a select few labels.

Above, we see a set of 9 images where a person is sitting, sleeping, hugging, using a laptop, and drinking. These are some of the labels that are in our image dataset for our training data. The goal is to create a prediction model that will be able to correctly match the images. We will use our test image dataset to see how well our model performs using the image data with the attached labels. Below, we provide a histogram of the label classes to show that there are an equal number of images in each class.

Data Preprocessing & Cleaning

Check Image Dimensions

Now that the images are loaded into CAS, we need to inspect the data for any abnormalities. We want to make sure all the images (test and train) are in the proper dimensions for our deep learning model. For our Deep Learning model using the CAS ActionSet, we want the images to be in a 32 x 32 frame for improved processing timing. We could create a model that processes images of any dimension, but I wanted to show in this post how you can make all images uniformed to the same dimensions for all of the images.

In the above image, we displayed the summary details of our image dataset, the summary provides the number of images being processed, such as the minimum, maximum, height, width of the images, and average pixel intensity values.

Now that the images are resized, we want to just check the dimensions are correct for all the images.

Building Deep Learning Model

In this section, we will start by using the CasAction “shuffle” to randomly shuffle the data and then partition the data to get ready to perform our deep learning model. In general, its common practice for splitting the data in a 70/30 split where 70% is our training and 30% is our validation set.

In the above figure, you can see that we shuffled our “train_out” data to allow for our images to be randomly selected for the deep learning model. After the “train_out” table is shuffled the table is then stored in CAS.

In the above figure, we have partitioned the data in two subsets of the “train_out” data. For our partition we use 70% for our training (Samp1) data and 30 % for our validation data (Samp2). Below we can see our Frequency table that provides the total number of observations and the number of observations for the training and validation. Now, we can now start to build our image classification machine learning model. We first need to specify a few parameters, the first is how many layers do we need to add to the deep learning model. For time constraints we use a very basic model that has only two hidden layers, but in many real-world applications a CNN model for object detection may include 100+ layers.

conn.deepLearn.buildModel(
model = dict(name='cnn',replace=True),
type = 'CNN'
)

conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='input', nchannels=3, width=32, height=32, scale=0.004, std='STD'),
replace=True,
name = 'data'
)

# Adding 3 Convolution Layers to the DL Model
conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='convolution', act='relu', nFilters=10, width=5, height=5, stride=1, init='xavier'),
srcLayers = 'data',
replace=True,
name = 'cnn1'
)

conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='convolution', act='relu', nFilters=10, width=5, height=5, stride=1, init='xavier'),
srcLayers = 'data',
replace=True,
name='cnn2'
)

conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='concat'),
srcLayers = {'cnn1','cnn2'},
replace=True,
name='concatlayer1'
)

# Adding 3 pool layers for DL Model
conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='pooling', width=2, height=2, stride=2, pool='max'),
srcLayers = 'cnn1',
replace=True,
name = 'pool1'
)

conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='pooling', width=2, height=2, stride=2, pool='max'),
srcLayers = 'cnn2',
replace=True,
name = 'pool2'
)

conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='fullconnect', n=100, act='relu', init='xavier', dropout = 0.2),
srcLayers = 'pool1',
replace=True,
name = 'fc1'
)
conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='fullconnect', n=100, act='relu', init='xavier', dropout = 0.2),
srcLayers = 'pool1',
replace=True,
name = 'fc2'
)

conn.deepLearn.addLayer(
model = 'cnn',
layer = dict(type='output', act='softmax', init='xavier'),
srcLayers = 'fc1',
replace=True,
name = 'output'
)

conn.deepLearn.modelInfo(
model='cnn'
)

Accuracy of the Model

After the model has successfully run, we want to assess the accuracy of the model. A model with good accuracy in would have an accuracy greater than 80%.

The plot's x-axis shows the epoch, while the y-axis shows the inaccuracy. Fit Error rapidly drops in the first epochs, as the plot shows, suggesting that the model is picking up on the training data. Initially, the validation error also declines, indicating that the model is generalizing successfully. Fit Error keeps declining, but Validation Error reaches a plateau after a predetermined number of epochs. The model performs well on the training set but not on the unseen data (the validation set), which implies that it may be beginning to overfit the training data.

Loss of the Model

This plot illustrates the change in loss, a measure of error, during the training of a model, across epochs. Both the training loss ("Loss") and the validation loss ("Validation Loss") are shown. Initially, both losses are relatively high, but they decrease as the model learns. The training loss decreases more consistently and reaches a much lower level by the end of training. However, the validation loss, after an initial decrease, begins to increase again towards the later epochs. This pattern suggests that the model is starting too overfit, it's worth noting that is demonstration post on how to use these techniques for image classification. To obtain minimum loss on the validation we could have used only 25 epochs instead of 200 in the final model to account for the overfitting in the validation data.

Conclusion

The model performance for the image classification deep learning model under performed only reaching an accuracy of 74%. There are some different factors that could help with improving the performance such a model with more hidden layers, but that would likely require a more powerful GPU. Also, for this post we only used approximately 12,500 images, which normally you would want more images to have a better model, but the downside is the increased processing time that could take an hour or possibly a whole weekend for processing. The goal of this post was not to try to build the most optimal model accuracy, but to introduce the concept of building a deep learning model and how working with real world data can be challenging. In this post we covered working with images of different dimensions and how we must prepare our image data properly to be able to receive meaningful results. In a future post, we look to improve the model adding layers, adjusting the learning rate, and changing the number of filters. By taking these steps in a future post, we could see some significant improvement. For more information around this topic please check out the links listed below.

For more information:

Find more articles from SAS Global Enablement and Learning here.