Introduction to Computer Vision: Image Classification Using Model Zoo

1 Like

If computer vision sounds interesting to you, then there is no better place to get started than image classification. Image classification is one of the fundamental techniques in computer vision, which enables machines to identify and categorize objects within images and, in many ways, is the basis for computer vision. Image classification algorithms are used in many industries including manufacturing, healthcare, security, retail, and many more. In this post I will introduce image classification using the model zoo action set. The goal of this post is to provide individuals new to computer vision with a basic understanding of how these algorithms work, and for individuals familiar with image classification, how to get started with the model zoo action set which is a new SAS action set for deep learning.

Image Classification Introduction

Image classification is ultimately a pattern recognition task where machines learn from a dataset of labeled images to predict the class (or label) of unseen images. Broadly speaking image classification algorithms are a subset of machine learning algorithms, which means that we need a training dataset to build the model. The training data set consists of images and the label associated with those images is the target. An example of what an image classification algorithm does in deployment is if it sees an image of dog (assuming it was trained on a dataset that enables it to classify dogs) it will generate a label for said image.

Working with Image Data

The data required to create an image classification model are going to be images, and the labels associated with the images. There's a variety of different ways SAS can read the images and their corresponding labels. One way is by placing all images belonging to a given class inside of a folder, where the folder name is the label associated with the images within said folder. SAS can then be look inside the folder and create a table containing the image data. Another way is to provide SAS with a csv containing both the image location and their labels, during model training and validation model zoo takes care of the rest. Both methods are very similar, it's a matter of convenience which technique is used.

Figure 1. CSV Containing Image Paths and Labels.

Select any image to see a larger version.
Mobile users: If you do not see this image, scroll to the bottom of the page and select the "Full" version of this post.

We now know how SAS can access our image data, but how are they processed and where is the data contained in the images? In traditional supervised machine learning we generally work with structured data, which usually comes in the form of rows and columns. In images the data we work with are the pixel values that when combined come together to generate said image. If you consider a row of data from a table as 1-dimensional (with n number of inputs), then you can probably start to imagine that images should contain information in at least two dimensions since they are a collection of pixel values arranged in a specific way. Regardless of the type of image we may end up working with, their pixel values will range from 0 to 255. In a grayscale (black and white) image, a pixel value of 0 represents black and a value of 255 represents white. Everything in between is some combination (gray) of the two. Color images are more complex because we must combine different colors to get our desired color. In the case of color images, we generally achieve this by combining red, blue, and green pixel values. Whenever we’re working with images we call each two-dimensional array corresponding to a specific color, a channel.

Figure 2. Grayscale and RGB Channels Example.

Both grayscale images and RGB images have pixel values ranging from 0 to 255, where a value of 0 corresponds to the absence of that color, and a value of 255 corresponds to the highest intensity of said color. In color images when we stack the RGB channels on top of each other we generate more complex colors since we combine the red, green, and blue channels to varying degrees. Now that we know the information that is contained in images, let's move forward and talk about how we can extract this information.

Fully Connected Neural Networks

At the heart of image classification lies the concept of feature extraction. This process involves capturing distinctive characteristics from images that differentiate one class from another. These include things like the edges of objects, textures, colors, etc. In a sense, this is similar to how we process visual information. We can see the shape of objects, silhouettes, color, lighting, and interpret what we’re seeing.

Although other algorithms have worked for image classification in the past, modern day computer vision applications use neural networks as their backbone. It should also be noted that there are different types of neural networks. Traditional neural networks, like the ones used for traditional machine learning tasks, are generally not the type of network used for computer vision. In these traditional neural networks, often called fully connected networks, all of our inputs are passed to every single hidden unit (also called neurons), where each hidden unit is a nonlinear function that transforms the inputs to learn complex nonlinear functions.

Figure 3. Single Layer Fully Connected Network with 3 Hidden Units and Logit Link Function(left). Decision Boundary (Right).

Fully connected neural networks can have multiple fully connected layers that ultimately result in either a shallow or deep neural network. After the final fully connected layer, the outputs of neurons are combined to generate some output that matches the distribution of the target.

Figure 4. Generalized Fully Connected Neural Network Diagram.

Although the neural networks that we use for computer vision are different than fully connected neural networks, a lot of the concepts are the same. There are multiple reasons why fully connected neural networks are not used for computer vision applications. When training a network, instead of having an observation correspond to a row from a table, now a single observation is an entire image. Consider a case where we have 32 x 32 pixel images (approximately the size of a thumbnail on a forum) although small, that corresponds to 1024 pixels per image. In other words if you flatten out the image that would be like a row with 1024 columns! From the get-go, this makes our fully connected neural networks very computationally expensive. Another issue as well is that pixel values are all related to one another. Imagine an image of a dog, realistically you could look at small regions of the image (parts of the dog such as ears, nose, eyes, tongue, paws, etc.) and classify the image as a dog without seeing the entire thing. So in a sense, pixels generally have some form of relatedness to pixels in their surrounding regions. Fully connected neural networks are not great at extracting this type of relatedness. For those reasons and more outside the scope of this blog, instead of using fully connected neural networks for image classification we instead use convolutional neural networks (CNNs).

Convolutional Neural Networks (CNNs)

CNNs, inspired by the visual processing of the human brain, have become the backbone of computer vision. These architectures are designed to learn hierarchical features from images, starting from simple features like edges to more complex ones. These networks are not just multiple layers of fully connected neurons, instead they incorporate a sequence of techniques that help the network extract information from images. Although there are many techniques that are used in modern day CNNs, the key techniques introduced in convolutional neural networks that helped them take a leap forward in the realm of computer vision are convolutional layers and pooling layers.

Figure 5. Convolutional Neural Network Layers.

In the image above we illustrate a very general architecture for a CNN. Because we no longer have only fully connected layers, we can create neural networks architectures that will look very different and work well on different tasks. Almost all convolutional neural networks will contain convolutional layers and generally some type of pooling layer. Before we introduce the architecture that we're going to be using, we need to develop a basic understanding of convolutional and pooling layers.

Convolutional Layers

Convolutional layers are generally used after the input layer and throughout different parts of the CNN. The convolutional layers use what are called filters (also called kernels), which help capture edges and contain learnable parameters that help them extract key pieces of information given a training dataset. In a fully connected neural network, the learnable parameters are contained within the neurons where we have one learnable parameter per input. In the output layer of fully connected neural networks we also have one learnable parameter for each output. In convolutional neural networks the filters contain the learnable parameters which it learns through multiple iterations of an optimization process using the training data set (similar to fully connected networks). What makes these filters special is that they help us address the issues fully connected neural network have when trying to perform image classification. The parameters in a filter are shared for multiple regions of an image/input, and the filters are applied for a region of the image. Filters make it so that we have a computationally efficient solution to processing images; they capture some of the interrelatedness of pixels in a region, and add invariance to the network. Each filter is going to generate an output called a feature map. The way that a filter generates the feature map is by computing a product between the parameters weights and a small region of pixels. This is illustrated in the figure below:

Figure 6. Cross Correlation Operation.

The operation is formally known as the cross-correlation operation but might sometimes be referred to as a convolution without the kernel flipping. The feature map then gets an activation function applied to it, just like neurons in fully connected networks. Filters also have hyper parameters associated with them, such as their width, height, and stride (how many rows/columns they move at a time). Convolutional layers are just one part of the story, the other layer type that makes convolutional neural networks very powerful are pooling layers.

Pooling Layers

Pooling layers help to summarize regions of incoming information, help to increase the invariance of CNNs, and make the networks more computationally efficient. Pooling layers are summary operations, with the most common pooling functions being max, average, and min pooling. There are other kinds of pooling layers, but that is outside the scope of this blog.

Figure 7. Common Pooling Layers.

Because filters apply a summary function to a given region, they are generally combined with a larger stride value in order to down-sample information. Pulling layers, as opposed to convolutional layers, are generally used more freely because they do not contain learnable parameters and therefore do not increase the computational requirements of networks too much. One of the theoretical motivations for pooling layers is that every single pixel value in an image does not need to be analyzed in order to get the main piece of information from a given region. Similar to how you can see a portion of an image, say the ear of a dog, and you can correctly deduce that the full image should be a dog.

Fully Connected Layers

Convolutional neural networks also include fully connected layers. They tend to appear at the end of the networks before we get to the output layers. Fully connected layers work in the same way as they did in fully connected neural networks. Before using these in a CNN we generally flatten the inputs to these layers such that they’re all contained in a single vector that can then be passed to every hidden unit in the fully connected layer (like a row in a table). Fully connected layers is where CNNs get very computationally expensive, and for this reason we tend to use pooling layers prior to this point to reduce the dimensionality of the incoming information. Just like with fully connected neural networks, the output of the fully connected layers gets passed to an output layer where a link function is used to compute the class probabilities. In the case of multiclass classification, the link function tends to be the softmax function which generates the probability of an observation belonging to each of the classes. Whichever class has the highest probability is the class that the observation will be assigned to.

LeNet5

We could spend more time talking about different techniques used in CNN's, different layers, different architectures, etc., however the goal of this blog is to give users a basic understanding of image classification. Not much else is needed outside of convolutional layers, and pooling layers in order to get a working model for image classification. To illustrate this point I will use a convolutional neural network known as LeNet-5. LeNet-5 was introduced by Yann LeCun and others in 1998 and was one of the earliest successful CNN architectures. It featured alternating convolutional and pooling layers, culminating in fully connected layers for classification. While simplistic by today's standards, LeNet-5 laid the groundwork for future advancements in CNNs.

Figure 8. LeNet-5 Architecture.

SAS allows us to build our own custom CNNs, which for the dataset chosen in this blog, building a CNN from scratch and getting good results is very feasible. However, I will opt to use this well known architecture, LeNet-5, because this mirrors how computer vision users commonly train computer vision models. Commonly a popular architecture that is known to perform well on a given task is used as opposed to constantly building new models from scratch.

Introduction to Model Zoo

Now that we have a background in computer vision and the LeNet-5 architecture, we can now start to discuss model zoo. Model Zoo is a CAS action set for deep learning based on the PyTorch programming framework available since the 2022.09 LTS release of Viya. The action set has a backend that calls the PyTorch C++ API and a front end that is responsible for initiating actions, terminating actions, reading CAS tables, writing CAS tables, and communicating with the backend portion. Aside from the architectural differences, Model Zoo takes a different approach to deep learning than the traditional deepLearn CAS action set. The process to build a neural network that works well on specific types of data can be a very time-consuming process that does not always yield the best results. For this reason, many users choose to use neural network architectures that have been published by reputable research groups that perform very well on specific tasks. These groundbreaking neural network architectures are generally published alongside an open-source implementation that oftentimes make their way to some of the most popular open-source deep learning libraries such as TensorFlow and PyTorch. For those reasons, Model Zoo currently puts a focus on using pre-defined model architectures as opposed to building your models directly in Model Zoo, which is how many users use deep learning in practice.

Model Zoo is a great tool for open source developers that have access to a SAS Viya deployment. Model Zoo can be used via Python or R through the SWAT package. The SWAT package acts as an API that allows you to submit CAS actions from Python or R as well as integrate these into classes, functions, conditional statements, etc. Additionally, if you're a deep learning developer familiar with PyTorch you can continue to use it to develop your models. Models created using PyTorch can be saved as torchscript files, wrapped in a few additional classes, then trained using CAS. The key advantage that this gives users is the ability to automatically have these models parallelized by CAS, and if GPUs are available you can leverage those as well. There is a lot that can be said about defining custom models using PyTorch then importing them into Model Zoo, but that is beyond the scope of this blog. Model Zoo is going to be one of the key SAS tools for deep learning over the coming years; so as a result, readers can expect a lot more functionality to be added to the action set over the coming years.

Model Zoo LeNet-5 Example

This demonstration will be performed using Python; however, users should be aware that they can also use Model Zoo in R or CASL. The first thing we need to do, like with most Python program, is import the necessary packages:

import os
import swat
import yaml
import pandas as pd

This may end up looking like a typical list of packages when using Model Zoo. Below is a table briefly describing the use of each package in this demonstration:

Package	Use
os	Required to save files to disk
yaml	Used to ensure that the YAML files used by Model Zoo are accurate and complete.
swat	Used to call the necessary CAS action sets, including the connection object and the model zoo actions.
pandas	Used to create dataframes/tables that can be saved in different formats.

Table 1. Python Package Description.

Now that we have imported the package, we can then establish a connection to CAS using the SWAT’s CAS class and load the necessary action sets. In this demonstration I will load the dlModelzoo action set, the image action set to be able to explore images saved in CAS, and the sampling action set to partition tables.

conn = swat.CAS("connection information")
conn.loadactionset("dlModelzoo")
conn.loadactionset("image")
conn.loadactionset("sampling")

Now that we have done some of the basic setup, we can start getting into the details of how to build your image classification models. We require a table that contains the images and their associated labels or a CSV that contains the image path (including image name, extension, and path relative to a CAS library). For this demonstration, the CSV shown in figure 1 was read into memory using the read_csv method from the connection object.

# Read CSV into memory
conn.read_csv('mnist_fashion_labels.csv', casout=dict(name='mnist', caslib='mycl', replace=True))

Something different between model zoo and the deepLearn CAS action set is that we need to create a YAML file that is used by the actions to specify the model and/or data set specific parameters. Although this YAML file may seem strange, it gives users the flexibility to have a fixed location where they can quickly make changes to the model such as image transformations, input size changes, number of target classes in the chosen algorithm, etc. It is possible to have all the information within a single YAML file, however for the sake of readability I generally prefer to split it up into two YAML files; one for training and one for scoring. One suggestion that I would like to give users when creating the YAML file, is to use a text editor such as VSCode or notepad++. The main reason for this is that the YAML file is sensitive to spacing, and the sub sections are tab delimited. Below is the YAML file that was used to train the model as well as a table explaining the options used in the YAML file:

documents = """ 
sas:
    dlx:
        train: 
            label: "lenet5_mnist"
            dataset:
                type: "UNIVARIATE"
                organization: "CoCo" 
            preProcessing:
                - modelInput: 
                    label: input_tensor1
                    imageTransformation:
                        resize:
                            type: TO_FIX_DIM
                            size: 32 32
                        imgStdType: STD 
 
            model:
                type: "TORCHNATIVE" 
                name: "SAS_TORCH_LENET" 
                classNumber: 10
                caslib: "mycl"
                inputs:
                    - label: input_tensor1
                      size:
                      - 0
                outputs:
                    - label: output_tensor1
                    size: 
                    - 0
"""

Train and Score	Description
label	Used by the train and scoring actions to call a specific section of the YAML file
dataset	Specifies the dataset type the action receives and the type determines how the data is read in and processed.
dataset.type	Supported dataset types are Univariate, ObjDetect, Segmentation, and Autoencoder.
preProcessing	Specifies the pre-processing (augmentation) of the input data as well as the target data.
preProcessing.modelInput	The action looks for the matching label in the model.inputs section to find the stream of input to apply the pre-processing to.
modelInput.label	The label gives the inputs a name and indicates the inputs (data) that the pre-processing will be applied to. Used by the actions to find the stream of inputs to apply the pre-processing to.
modelInput.imageTransformation	Allows users to apply transformations to the image, such as resizing, color transformations, random transformations, etc.
imageTransformation.resize	Used to resize the inputs to a specific height and width
imageTransformation.imgStdType	Normalizes the image pixel values such that each value falls in a range of [0, 1]

Table 2. Train and Score YAML Options.

There are options such as dataset.organization that are just for the user to have a little bit more information that are not used by the Model Zoo CAS actions. Your YAML files should start off with SAS, DLX, and then the subsection after DLX can either be train or score depending on whether you will use this YAML file to train the algorithm or score using the algorithm. Most of the options specified under the training subsection are explained in the table above, so I will not be explaining those options.

Dataset and Pre-Processing

Before we go any further, we first need to discuss the dataset that we’re going to be using in this demonstration and try to understand it a bit more. We’re going to use a classic dataset in computer vision, the MNIST Fashion dataset. This full dataset contains 60,000 images split into 50,000 images for training and 10,000 belonging to validation. For the sake of our demo we're using a smaller version of this dataset containing only 10,000 images for training and 5,000 for validation. The dataset consists of 10 classes of grayscale images, where the classes are different articles of clothing such as shoes, shirts, skirts, etc. In this dataset all the images are PNG files of the same size, 28 x 28. This makes this dataset very popular for learning image classification since the overall computational requirements to process data of this size isn’t massive. This dataset doesn’t need too much in terms of pre-processing. The only transformation that we’re applying to the inputs is a standardization of the image pixels. Image pixels range from values between 0 and 255, so to ensure the weight updates/values are not too drastic in scale and for information to flow better through the network, standardization techniques such as this one are common. Note that there are many preprocessing techniques available in model zoo, however, in this example I chose not to apply any outside of this one. The second important subsection that we need to specify is the model subsection, here is where we can give model zoo more details about what kind of model we want to build. The table below explains some of the options that were used:

Model Options	Descriptions
type	Currently supports two types of models, TorchScript and TorchNative. TorchNative are models that are pre-written in C++ and built into the action library. TorchNative is used for pre-built models such as YOLO and UNet. TorchScript is used to import custom models written in PyTorch.
name	Used only for TorchNative models to specify what type of model the user wants to use.
caslib	CasLib where model weights can be loaded and saved to.
classNumber	Number of classes in our targets.
inputs	Can be used to apply changes and transformations to the input and target tensors.
inputs.label & outputs.label	Specifies the name of the tensor to be modified.
inputs.size & outputs.size	Can be used to reshape the specified tensor to a given size. A value of 0 means no reshaping, if reshaping, then values need to be specified as: - Channel - Height - Width

Table 3. Model YAML Options.

Underneath the model subsection the first thing that we need to specify is what type of model we want to build. TorchNative corresponds to the pre-built models, while TorchScript correspond to user defined models using PyTorch. Afterwards you need to specify the name of the model that you want to build, which in this case SAS_TORCH_LENET. The ModelZoo documentation shows which models are currently available to use within the action set, with more on the way in the future. Another necessary option when building a classification model is the number of classes. The objective of the model in this case is to distinguish between the 10 different image types, so in this case my class number is going to be 10, one corresponding to different articles of clothing. We do not need to resize the input or output tensors, so we leave the size option as 0. That completes the YAML file, so next we can check to see whether it is syntactically accurate by using the YAML package along with the following lines of code:

for data in yaml.load_all(documents):
print(data)

If the YAML file is syntactically correct the output will be a JSON string that displays all the user specified options. Keep in mind that this doesn't ensure that you won't have any errors whenever the YAML file is used to train or score the model, this only ensures that the YAML file is syntactically correct. In other words that every single option is delimited and spaced correctly.

Now that the YAML file is complete and syntactically correct, we can move forward with training the image classification model. The way that we're going to do that is by using the dlmztrain action from the Model Zoo action set. Below is the dlmztrain action as well as a table with a description of the options being used in the action:

train_action = conn.dlmztrain(loglevel = "DEBUG", table = dict(name = "mnist", where = "_PartInd_ = 1"), 
                                                               validationtable = dict(name = "mnist", where = "_PartInd_ = 0"),
                                                               inputs = "_path_", targets = "xlabels", ngpus = 1, 
                                                               indexvariables = "labels", 
                                                               modelOut = dict(name = "LeNet5", replace = True), 
                                                               outputIndexMap = dict(name = "LeNet5_outputindex", replace = True),
                                                               dropLast=True,
                                                               optimizer=dict(loss='cross_entropy',
                                                                              mode=dict(type='synchronous', syncFreq=1,),
                                                                                        algorithm=dict(
                                                                                                       learningRate=0.002,
                                                                                                       momentum=0.5,
                                                                                                       method='sgd',
                                                                                                       weight_decay=0.0005
                                                                                                       ),
                                                                                        batchSize = 32, seed=12345, maxEpochs=50),
                                                               options=dict(yaml = documents, label="lenet5_mnist"))

dlmzTrain Action Options	Description
loglevel	Reporting level for progress messages sent to the client. DEBUG allows users to see more information related to the training process.
table	CAS Table that is used to store the input data for the training step in training a deep learning model.
inputs	The input variables for the training task. Currently, we support image data as input, the input column can either be a string of an image path or a binary of the image data.
targets	The target variables for the training task.
ngpus	Used together with HyperParameter Tuning, specifies the number of GPU's to use across the entire grid. The GPU's with the lowest amount of memory currently allocated will be chosen. This option is mutually exclusive with the gpu option.
validationTable	The CAS table that contains the validation dataset. Used to assess model weights after each epoch.
modelOut	The CAS table used to store the trained model and model weights.
checkPointBest	Specifies whether to save the model weights that performed best on validation or the model produced in the final epoch.
outputIndexMap	The output CAS table containing a mapping from nominal class type to numeric values. The Index table of nominal class type to numeric values built in the training process is written to the CAS table at the end of the action.
optimizer	Key component of training any type of neural network. Currently supports a variety of optimization algorithms including SGD, Adagrad, ADAM, and ADAMW.
tuner	Specifies settings for hyper parameter tuning
learningRateScheduler	Used to specify a learning rate policy, such as a fixed learning rate or one that’s modified after n number of steps.
extraOptions	Used to specify the YAML file that should be read when training or scoring model as well as which section to read by using the label option.

Table 4. dlmzTrain Action Options.

The first option I would like to highlight is the log level option. There are a variety of values that it could take. Personally, I like to set the log level equal to debug so that I get more messages about the training process. In the table option we specify the name of the CAS table that we created by using the loadimages action. The CAS table contains the names of the images, the location in memory of the images themselves, and their corresponding labels. We use the inputs option to specify the name of the column that contains the images, and the targets option to specify the name of the column that contains the labels associated with the image. If we are using a machine with GPU capabilities, we can use the GPU equals option and specify the number corresponding to the GPU that we want to use.

Our model weights are going to be stored in a CAS table called LeNet5 and the weights that will be stored are the weights associated with the iteration that yielded the best performance on validation thanks to the checkPointBest option. The optimizer parameter is very important because this is the process that perform to optimize the model weights. The loss function is used to specify how the loss will be calculated. Loss functions are very closely related to the distribution of the target and it's number of classes, as well as what type of model you may be trying to build. The algorithm option allows us to specify the details regarding the algorithm we want to use such as the learning rate, momentum, optimization technique, weight decay and more. Other options that I have specified in the optimizer option include the batch size, which is the number of images that I want to use for every iteration of the optimization algorithm, the seed to make the optimization reproducible, and the max epochs which specifies how many iterations of the optimization I want to go through. The learning rate scheduler can be used to specify whether we want a fixed learning rate for the optimization, or whether we want the learning rate to be changed throughout the optimization. In this demo, the learning rate is adjusted after every 10 epochs. Lastly, we use extraoptions to specify the name of the YAML file, and which section of the YAML file should be read to train the model by using the label option.

Figure 9. dlmzTrain Action Output.

Keep in mind that the output will differ depending on the log level, a level of debug gives a lot of information regarding the training process. I can also see information about the model such as the name of the model, the kind and number of layers in the model, how the loss function progresses, as well as the misclassification error. At the conclusion of the training process, we get a reason as to why the optimization stopped and a message stating that the action completed successfully. Now that the model is trained, we can move to start scoring with the model. To score using our model, we need instructions in a YAML file once again. The good thing is that the instructions for this YAML file look very similar to the ones in the training YAML file. # Specifies the YAML file to score using the model

score_doc = """ 
sas:
    dlx:
        score: 
            label: "lenet5_mnist"
            dataset:
                type: "UNIVARIATE"
                organization: "CoCo" # enum[IMAGENET, OPENIMAGE, COCO]
            preProcessing:
                - modelInput: 
                    label: input_tensor1
                    imageTransformation:
                        resize:
                            type: TO_FIX_DIM
                            size: 32 32 
 
            model:
                type: "TORCHNATIVE" # enum [TORCHSCRIPT, TORCHNATIVE]
                name: "SAS_TORCH_LENET" # ignored when type is "TORCHSCRIPT"
                classNumber: 10
                caslib: "mycl"
                inputs:
                    - label: input_tensor1
                      size:
                      - 0
                outputs:
                   - label: output_tensor1
                     size: 
                     - 0
"""

The main differences between this YAML file and the one that we use for training, is that the third sub option is score instead of train, outside of this change everything remains the same. With a complete YAML file we can now use the dlmzscore action. Below we can see the code as well as a table that provides a brief description for all the options that were used within the action:

# Scores the validation datset
score_action = conn.dlmzscore(modelTable = "LeNet5", table = dict(name = "mnist", where = "_PartInd_ = 0"), 
                              inputs = "_path_", targets = "xlabels", batchsize = 16, gpu = {0}, loglevel = "DEBUG", 
                              tableout = dict(name = "lenet_output", replace = True),
                              options = dict(yaml = score_doc, label = "lenet5_mnist"))

dlmzScore Action Options	Description
loglevel	Reporting level for progress messages sent to the client. DEBUG allows users to see more information related to the training process.
modelTable	The CAS table containing the model and model weights. This parameter is optional. When it is specified, the table stores a binary blob containing the model and weights in Pytorch format; when it is not specified, you can specify the model table file path relative to a CAS table path in the YAML file in the extraOptions parameter.
table	The input CAS table that is used to store the input data for scoring a Deep Learning model.
inputs	The input variables for the scoring task. Currently, we support image data as input, the input column can either be a string of an image path or a binary of the image data.
targets	The target variables for the scoring task.
batchsize	Number of images to score each epoch.
GPU	Specifies the GPU to use when scoring.
tableOut	Output table used to store the output generated from scoring.
extraOptions	Used to specify the YAML file that should be read when training or scoring model as well as which section to read by using the label option.

Table 5. dlmzScore Action Options.

Similar to the dlmztrain action, the amount of information that we get as part of the output, is going to depend on the log level value. Since the log level is once again set to debug, we get as much information as possible regarding the scoring process. Just like before, we see information regarding our model such as the number of layers, the kinds of layers, the number of parameters, etc.. We also see how the loss and MCE changed as we scored our batch of validation images. The tableout option specifies the name of the output table that will contain the predicted labels for the validation images, the table generated by the score action was named lenet_output.

Figure 10. dlmzScore Action Output.

The output table is overall very simple as it contains just the label associated with the validation image, however it can be modified to contain more information such as the image for which we are generating the prediction for.

Figure 11. First 5 Rows of Validation Images with True Label (left) and First 5 Predicted Labels (right).

Your image classification may be a part of a much larger process that involves other moving parts but realistically these predictions can be used to take real world actions. There are a lot of things that could be used to improve this model like various image augmentation techniques, shuffling the images, changes to the optimization settings, etc. However, the purpose of this blog was to introduce image classification to those unfamiliar with it and to introduce modelzoo to existing deep learning practitioners.

Additional Resources:

Find more articles from SAS Global Enablement and Learning here.

Introduction to Computer Vision: Image Classification Using Model Zoo

Free course: Data Literacy Essentials

Get Started