Building a Human Action Recognition Model

1 Like

Human action recognition models analyze visual data to identify and classify specific human actions. These models can be used to develop smart surveillance systems that can detect suspicious activities, helping to improve public safety. Furthermore, they can aid in creating personalized fitness programs by analyzing exercise form and providing feedback, promoting healthier lifestyles. In this post, we will discuss understanding human behavior and how to predicted a label that is associated using a data source provided by Kaggle. We started with using a SAS Viya Workbench accessing a Juptyer Notebook, we ran into some issues when processing the image dataset and this is due to not having GPU capabilities for SAS Viya Workbench . So, the decision was made to move to an virtual image environment that allows access to GPU capabilities.

First, we transferred our large image dataset to our Viya environment using WinSCP. Image processing and classification, particularly with deep learning, demands significant parallel computation. A GPU is essential for efficient analysis due to its architecture, which drastically accelerates these calculations compared to a CPU. To establish a connection between our local device and the virtual image we must provide our IP address and login credentials for our virtual image.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Once the files have been placed onto our server, we name our folder as “Human_Action_Recognition”. Inside that folder we have the test and train photos folders with both our test and train csv files that provide our photo labels as well. For the test dataset for our human action recognition data contains 5,410 photos and train dataset has 12,600 photos with 15 different category labels.

Now that we have our data successfully loaded, we start by loading our data by providing the path to the location of our datasets. Once the datasets are loaded, we want to print out the categories attached to the csv file for “train”. We are using only python code in Jupyter notebook for our image classification model, we want to make a sequel post that uses only the CAS actions and auto-tuning to try an improve on the models performance. In the above illustration we see we have an index of:

Calling
Clapping
Cycling
Dancing
Drinking
Eating
Fighting
Hugging
Laughing
Listening_to_music
Running
Sitting
Sleeping
Texting
Using_laptop

Now, that we have loaded our data and have and understanding of how our data is shaped, we need to illustrate category distribution for our training dataset.

# Alternative method using matplotlib directly (if you don't need the interactivity of Plotly):

label_counts = train['label'].value_counts()

plt.figure(figsize=(8, 6))  # Adjust figure size if needed

plt.bar(label_counts.index, label_counts.values)

plt.title('Distribution of Classes in Training Set (Matplotlib)')

plt.xlabel('Variable Labels')

plt.ylabel('Count')

plt.xticks(rotation=45, ha='right')  # Rotate x-axis labels for better readability if needed

plt.tight_layout() # Adjust layout to prevent labels from overlapping

plt.show()

In the illustration provided above, we can see the distribution classes for all the labels mentioned in the previous section. We see we have an evenly distributed count for all the labels. Next, we want to display our images with the class labels attached. We start by building our def statement in python and we will use our training dataset for this illustration.

# Images from each class

def display_multiple_images_per_class(train, train_path, images_per_class=3, rows=3, cols=5):

num_classes = train['label'].nunique()

total_images = min(rows * cols, num_classes * images_per_class) # Limit total images

fig, axes = plt.subplots(rows, cols, figsize=(10, 5))

axes = axes.flatten()

image_index = 0

for class_name in train['label'].unique():

class_images = train[train['label'] == class_name]['filename'].values

num_images_to_show = min(images_per_class, len(class_images))  # Show up to images_per_class

for i in range(num_images_to_show):  # Inner loop for images within a class

if image_index < total_images: # Check if we have filled the grid

img_path = os.path.join(train_path, class_images[i])

try:

img = plt.imread(img_path)

axes[image_index].imshow(img)

axes[image_index].set_title(f"{class_name} ({i+1})") # Indicate image number

axes[image_index].axis('off')

image_index += 1

except Exception as e:

print(f"Error: {e}")

axes[image_index].set_title(f"Error: {class_name}")

axes[image_index].axis('off')

image_index += 1

else:

break

if image_index >= total_images: # Check if we have filled the grid

break

plt.tight_layout()

plt.show()

# Example usage:

display_multiple_images_per_class(train, train_path, images_per_class=2, rows=3, cols=3) # Show 2 images per class

In the illustration above, we see a 3 by 3 matrix that provides 9 images and label attached to each image that describes the action being shown. The code runs an inner loop that processes through each image and reads the label onto each image. Next, we start the preprocessing of the training dataset. We also will split our dataset with 80 % being consider for our training set and 20 % for the validation set.

# Data Preprocessing for Train Dataset

train['label'] = train['label'].astype('category')

train['label'] = train['label'].cat.codes

train['filepath'] = train['filename'].apply(lambda x: os.path.join(train_path, x))

# Split training and validation set for 30% of our Data

train_set, val_set = train_test_split(train, test_size=0.2, stratify=train['label'], random_state=42)

# Split training and validation set for 30% of our Data
train_set, val_set = train_test_split(train, test_size=0.2, stratify=train['label'], random_state=42)

def load_image(filepath, label):
image = tf.io.read_file(filepath)
image = tf.image.decode_jpeg(image, channels=3)
image = tf.image.resize(image, [128, 128])
image = image / 255.0
return image, label

# Create TensorFlow datasets
train_dataset = tf.data.Dataset.from_tensor_slices((train_set['filepath'].values, train_set['label'].values))
train_dataset = train_dataset.map(load_image).batch(32).shuffle(buffer_size=len(train_set))

val_dataset = tf.data.Dataset.from_tensor_slices((val_set['filepath'].values, val_set['label'].values))
val_dataset = val_dataset.map(load_image).batch(32)

# Model Building
model = Sequential([
Conv2D(32, (3, 3), activation='relu', input_shape=(128, 128, 3)),
MaxPooling2D((2, 2)),
Conv2D(64, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Conv2D(128, (3, 3), activation='relu'),
MaxPooling2D((2, 2)),
Flatten(),
Dense(128, activation='relu'),
Dropout(0.5),
Dense(15, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Now that the preprocessing is complete, we can perform our model training. In the model training stage, we perform an epoch of 30, the epoch is a single iteration though the training data that iterates through data 30 times to try to improve on the model performance. Each and every sample from your training dataset will be used once per epoch. After the model training was completed, we saw an accuracy of approximately 87 % on the training set, and approximately 30 % on the validation set which could suggest overfitting of our data.

# Plot Training and Validation Accuracy

epochs = np.arange(len(history.history['accuracy']))  # Get epoch numbers

plt.figure(figsize=(10, 5))  # Adjust figure size if needed

plt.plot(epochs, history.history['accuracy'], label='Training Accuracy')

plt.plot(epochs, history.history['val_accuracy'], label='Validation Accuracy')

plt.title('Model Accuracy')

plt.xlabel('Epoch')

plt.ylabel('Accuracy')

plt.legend()  # Show the legend

plt.grid(True) # Add a grid for better readability (optional)

plt.tight_layout() # Adjust layout to prevent labels from overlapping

plt.show()

In the above illustration, the line plot displays the training and validation accuracy with an epoch on the x-axis and the accuracy level y-axis. The training data model started close to 10 % accuracy and improved with a gradual climb between epoch 5 to 15. We saw that the training set drastically outperformed the validation set which could be expected. The model accuracy for the validation was approximately 32% , which could suggest the model did not perform well. Next, we look at the a classification report.

The classification report reveals the performance of a machine learning model across 15 different activities. While the model achieves an overall accuracy of 32%, a closer look at the precision and recall scores for individual activities reveals a nuanced picture. For instance, "cycling" boasts a relatively high precision and recall, indicating the model's strong ability to correctly identify this activity. "Eating" also demonstrates reasonable performance, although with a noticeable drop in recall compared to precision, suggesting potential confusion with other activities. Conversely, activities like "calling," "drinking," "hugging," "listening to music," and "texting" struggle with both low precision and recall, implying significant challenges for the model in accurately classifying these actions. Next, we create a confusion matrix to better interpret our classification report to better understand the relationship between validation predictions and labels.

conf_matrix = confusion_matrix(val_labels, val_predictions)

plt.figure(figsize=(8, 8))  # Adjust figure size as needed

plt.imshow(conf_matrix, interpolation='nearest', cmap=plt.cm.Blues) # Use a colormap

plt.title('Confusion Matrix for Image Classification')

plt.colorbar()  # Add a colorbar

tick_marks = np.arange(len(categories)) # Assuming 'categories' is your list of class names

plt.xticks(tick_marks, categories, rotation=45, ha='right') # Rotate labels if needed

plt.yticks(tick_marks, categories)

# Add text annotations inside the confusion matrix cells

thresh = conf_matrix.max() / 2.

for i, j in np.ndindex(conf_matrix.shape):

plt.text(j, i, format(conf_matrix[i, j], 'd'),

horizontalalignment="center",

color="white" if conf_matrix[i, j] > thresh else "black")

plt.ylabel('True Label')

plt.xlabel('Predicted Label')

plt.tight_layout() # Adjust layout to prevent labels from overlapping

plt.show()

A confusion matrix provides a comprehensive breakdown of a model's performance on a classification task. In this image classification scenario, we see the model's predictions across 15 distinct activities. There are an equal number of images in each of the category labels that were evenly distributed. The matrix, visually represented with a gradient from light to dark blue, reveals both correct classifications (along the diagonal) and confusions between activities (off-diagonal elements). Each row represents the actual label, while each column indicates the predicted label. For instance, we can observe that the model frequently confuses "calling" with "laughing" and "using_laptop," as shown by the relatively high numbers in the corresponding cells. Notably, the diagonal shows a varying degree of accuracy across activities, with some, like "sleeping," achieving higher correct classification rates than others. Analyzing this matrix allows us to understand the model's strengths and weaknesses in discerning between different activities, guiding future improvements and highlighting potential areas of concern.

This illustration showcases the current state of our human action recognition model, highlighting both its successes and areas needing improvement. While some activities, like "cycling" and "laughing," are accurately classified, confusions arise with similar actions, such as misclassifying "listening to music" as "texting" and "fighting" as "running." These discrepancies underscore the need for model refinement, particularly in distinguishing between fine-grained activities and recognizing complex actions. Building upon this foundation, a future post sequel will delve into leveraging SAS Viya CAS Actions. We will explore how SAS Viya's advanced analytics capabilities can be used to identify patterns in misclassifications, pinpoint specific areas of weakness in the model, and ultimately guide iterative model improvements to enhance overall accuracy. Furthermore, we will demonstrate how SAS Viya can facilitate the development of enhanced models, potentially incorporating additional features or utilizing different algorithms, to address the identified challenges and improve the robustness of our human action recognition system.

In conclusion, human action recognition models offer significant potential for understanding human behavior patterns. Within data science, these models can play a vital role, enabling advancements in various fields. For example, they can be used to analyze customer behavior in retail settings, optimize sports performance through detailed movement analysis, and even improve healthcare by monitoring patient activity. Furthermore, these models can contribute to creating more intuitive human-computer interfaces and developing advanced robotics that can better interact with humans. As data science continues to evolve, human action recognition models are poised to become increasingly important tools for extracting meaningful insights from visual data. This technology can unlock new possibilities in diverse sectors, driving innovation and improving our understanding of human actions.

For more information:

Find more articles from SAS Global Enablement and Learning here.

Building a Human Action Recognition Model

Registration is open

SAS AI and Machine Learning Courses