Running Python models in SAS using Kubernetes volumes and Azure Files storage

6 Likes

The ability to execute analytical models written in the Python language is not new for users of SAS Viya. This capability has been introduced in the 3.x version of the platform and as you would have expected it is still available on the latest release, which is now based on the Kubernetes Container orchestration platform.

However, the shift to a cloud-native architecture has completely changed how you set up the Viya platform and this of course also includes all optional add-ons such as enabling the PyMAS package. PyMAS is SAS’ DataStep2 interface to Python, allowing you to execute Python code using the SAS Micro Analytic Service (MAS) engine. MAS is the realtime engine of SAS and is used by three different SAS products:

SAS Model Manager
SAS Intelligent Decisioning and
SAS Event Stream Processing

This blog walks you through the steps of deploying and testing the PyMAS module on a SAS Viya 2020.1 release deployed on the Microsoft Azure cloud using Azure’s managed storage service (Azure Files) for storing the Python runtime. We’ll also show you how to use SAS Model Manager to publish a Python model to MAS and validate the this step after setting up the environment.

We recommend that you also take the time to review the PyMAS related documentation found in the MAS Programming and Administration guide. It explains the same steps as this blog but uses a NFS share instead of Azure Files for storing the Python runtime.

Getting started (some background)

In order to enable MAS to execute a model written in the Python language, a Python runtime has to be made available to the engine. On the Viya 3.x platform this simply meant that you had to install a Python environment next to MAS. Obviously this cannot be done in the same manner on the latest release which is fully containerized – installing software into a running container is a true anti-patten since instances should be treated as being ephemeral and it probably would also introduce a lot of security concerns if that option existed. On the other hand it wouldn’t make much sense if SAS would pre-install a Python runtime in the shipped container images – which version should it be, which Python packages should be made available?

The solution for this problem is as simple as clever: you don’t “install” the Python runtime, you simply attach a volume to the MAS container instance, which has the Python environment(s) you want to use. The volume is mounted to the instance’s filesystem like a network drive and SAS can use the Python runtime from there. This brings a lot of advantages: not only can you customize a Python runtime which is tailor cut for your needs but you also can easily swap in and out different versions when you apply updates.

The following picture explains the idea:

Figure 1: Using Azure Files for SAS on Kubernetes

The 3 pods we’re interested in are MAS, CAS and Compute (SPRE). They are deployed in a separate namespace (“viya4”) on a Kubernetes cluster which in our case has been created in the Azure cloud (thus it’s the managed Kubernetes service called “AKS”). “Outside” the Kubernetes cluster, but still part of your Azure subscription, we’re using the Azure Files service, which is a standard offering in the Azure cloud for providing shared storage to your virtual machines (and Kubernetes pods). Using the Files service we have prepared a fileshare which contains the Python runtime.

Basically 2 things are needed to make the magic happen. First, the pods need a way to attach the shared storage. This is done through the standard Kubernetes Container Storage Interface (CSI, see here for more information). Second, the SAS/CAS/MAS sessions need to know where to look for the Python binaries. This is done by setting environment variables which are provided via a Kubernetes ConfigMap object loaded into the pods.

Understanding Kubernetes storage concepts (on Azure)

Not trying to make you an expert in the Kubernetes storage architecture, but we still need to cover a few basics. One of the main characteristics of the Kubernetes design is its’ use of interfaces to shield applications from lower level details. This allows Kubernetes workloads to run on any public (and private) cloud without changes, because they do not need to know anything about the underlying infrastructure. Kubernetes will “translate” any resource requests (for CPU, for memory, for storage …) to the underlying layer.

What does that mean in terms of storage? Look at this picture:

Figure 2: Kubernetes storage concepts

Pods use external volumes by setting a claim (PVC) for a persistent volume (PV) which is the Kubernetes representation of a disk (or fileshare) on the underlying infrastructure. This can either happen dynamically or by referencing a disk / fileshare which is already existing. And this is exactly what we want.

One final word concerning the Azure Files service. This is an easy way to provide shared storage for virtual machines and Kubernetes pods. It comes in different flavours and you can choose the one that matches your requirements in terms of cost and performance. For our purpose you might want to consider the faster premium tier with provisioned capacity (using SSD drives under the hood) over the regular standard tier in order to reduce latency (see here for more information).

Now let’s walk through a sample deployment step by step.

Provisioning Python and customizing the Viya deployment

Please note: all commands and scripts have been shortened in order to improve readability. You can download the resource ZIP file attached to this blog containing all resources in full length.

Here’s a high-level overview of the steps we need to take:

Create a storage account and a fileshare in Azure Files
Provision the Python runtime to the fileshare
Prepare the kustomize patches needed for the Viya deployment
Deploy and perform post-config steps

As said before, we do not rely on dynamic provisioning of storage, so we have to create the fileshare manually. It’s a one-time task and you can do this either using the Azure Portal or by using the Azure CLI (check the attached ZIP file for an example). Make sure that you retrieve the access key and store it as a Kubernetes secret – the pods will need it later when accessing the fileshare.

kubectl create secret generic <storage-account-secret> -n default \
  --from-literal=azurestorageaccountname=$AZ_STORAGE_ACCOUNT \
  --from-literal=azurestorageaccountkey=$AZ_STORAGE_KEY

Before we can install the Python environment to this fileshare, it has to be made available to Kubernetes (again a one-time task). We need a couple of Kubernetes objects for this: a StorageClass, a PersistentVolume and a PersistentVolumeClaim. All objects can be specified in a single YAML file, basically like this:

kind: StorageClass
metadata:
  name: python-sc                              <--
provisioner: kubernetes.io/azure-file            |
parameters:                                      |     
  storageAccount: <storage-account>              |     
---                                              |    
kind: PersistentVolume                           |     
metadata:                                        |     
  name: pv-python                                |     
spec:                                            |     
  claimRef:                                      |     
    name: pvc-python                   <--       |      
  storageClassName: python-sc            |     <-- 
  azureFile:                             |       |     
    secretName: <storage-account-secret> |       | 
    shareName: <fileshare>               |       | 
---                                      |       | 
kind: PersistentVolumeClaim              |       | 
metadata:                                |       | 
  name: pvc-python                     <--       | 
spec:                                            | 
  accessModes:                                   | 
    - ReadWriteMany                              | 
  storageClassName: python-sc                  <--

Note how these objects are connected to each other by their names. Pods can now refer to a PVC (“pvc-python”) in order to access the fileshare on Azure Files.

As a next step, we need to actually install the Python runtime on this fileshare. There are many options how to do this, in our case we’ll simply use a Kubernetes job which runs a shell script for us (this saves us from creating an additional virtual machine). Let’s assume we have a Linux script for installing a Python in version 3.8.6 and a requirements.txt for adding some additional Python packages (check the attached ZIP file for an example). To make them available to the Kubernetes job, we store them as a ConfigMaps object:

kubectl create configmap python-builder-script-3.8.6 \
    --from-file=install-python-3.8.6.sh \
    --from-file=requirements.txt

And then refer to this map in the job definition:

kind: Job
metadata:
  name: python-builder-job
spec:
  template:
    spec:
      containers:
      - image: centos:centos7
        name: python-builder-job
        command: ["/bin/sh", "-c"]   <-- command to be executed (script takes
        args:                            target folder for install as argument)
          - /scripts/install-python-3.8.6.sh /python/python-3.8.6
        volumeMounts:
        - name: host-volume
          mountPath: /python         <-- mountpoint of Python volume
        - name: install-script
          mountPath: /scripts        <-- mountpoint of scripts configmap
      volumes:
      - name: host-volume
        persistentVolumeClaim:
          claimName: pvc-python      <-- the claim we created earlier
      - name: install-script         <-- the install script and requirements.txt
        configMap:
          name: python-builder-script-3.8.6         
          defaultMode: 0755

Once this job has completed it’s work, we should be able to see our Python runtime in the Azure portal:

Figure 3: Checking fileshare contents in the Azure portal

That’s all what is needed for preparing the Python runtime for Viya.

Moving on to the Viya deployment, you now have to follow the steps explained in the README.md located in the $deploy/sas-bases/examples/sas-open-source-config/python/ folder. To quickly summarize:

Make a copy of these files in your site-config folder
Edit the variables in site-config/sas-open-source-config/python/kustomization.yaml, e.g.

configMapGenerator:
- name: sas-open-source-config-python
  literals:
  - MAS_PYPATH=/python/python-3.8.6/bin/python3.8  <-- /python is the default
  - DM_PYTHONHOME=/python/python-3.8.6/bin         <-- mount point of the
  - DM_PYPATH=/python/python-3.8.6/bin/python3.8   <-- fileshare

Add references to your Python volume claim to all relevant sections in site-config/sas-open-source-config/python/python-transformer.yaml. There are multiple sections in this file which you need to update, but the replacement is always identical. To give you an example:

  # Add python volume
  - op: add
    path: /spec/template/spec/volumes/-
    value: 
      name: python-volume
      persistentVolumeClaim:
        claimName: pvc-python             <-- the claim we created earlier

Edit your main kustomization.yaml to include your patches to the build:

resources:
...
- site-config/sas-open-source-config/python 

transformers:
...
- site-config/sas-open-source-config/python/python-transformer.yaml

We won’t touch this here but you also should to make some persistent storage available for ASTORE files. Luckily that’s a fairly simple step. It’s explained in $deploy/sas-bases/examples/sas-microanalytic-score/astores/README.md and also check this section in the Viya Operations Guide for instructions on how to do this: “Configure Model access”.

Now go ahead, build the main YAML file and deploy Viya as usual. Once your shiny new environment is up and running, don’t forget the final post-configuration step to enable publishing models to MAS. To run Python, one of the following is required on the CAS server for user authentication:

The user who is logged in must be a member of the CASHostAccountRequired group. For information, see The CASHostAccountRequired Custom Group in SAS Viya: Identity Management.

(alternatively) the CASALLHOSTACCOUNTS environment variable must be set. For information, see env.CASALLHOSTACCOUNTS in SAS Viya: SAS Cloud Analytic Services.

Validating the deployment in SAS Model Manager

Now that we have completed the deployment steps we’ll want to make sure that everything works as expected. To keep it simple we’ll walk you through the steps of uploading a Python model to SAS Model Manager, publish it to the MAS destination and then validate this step using Model Manager’s built-in test functionality. A Python model and a sample dataset is provided in the attached ZIP file.

First you have to create a new SAS Model Manager project. A project can include different models and different versions of a model. We have collected all model files that belong to the model in a zip file: Metadata information about input and output variables, model property information, the train and score code, model fit statistics and the pickle file to apply the model to new data.

To upload this model to SAS Model Manager click on “Add models” and “Import” and pass the model ZIP file:

Figure 4: Import a Model

Clicking on “policy_claim_XGBOOST” lists all corresponding files of that model. There are separate sections for SCORE CODE, SCORE RESOURCES, VARIABLES and PROPERTIES AND METADATA:

Figure 5: Model Files

To execute the Python code through the MAS engine a DS2 wrapper code is needed. Depending on the usecase you can either publish your model to MAS (realtime or transaction scoring) or CAS (batch scoring). In both cases the PyMAS package is used to execute the Python code, but different wrappers are needed:

The DS2 Embedded Process Code Wrapper is used for CAS execution
The DS2 Package Code Wrapper is used for MAS execution

Figure 6: DS2 Embedded Process Code Wrappers to execute Python from the CAS and MAS engines

After you published your model to the MAS or CAS publishing destination, a validation test is generated for you automatically. You can find this validation test on the Scoring/Publishing Validation tab.

Figure 7: Publishing Validation

You need some sample data in order to run this test. We’ve provided a small dataset in the attached resource ZIP file for this blog. Upload this dataset to CAS, assign it to the test and then run it. It should return a green status, meaning that the validation test was successful for the Python model. Your model is now ready to “go live”.

Conclusion

This blog has walked you through the steps needed for providing a Python runtime to use the PyMAS package for the CAS and MAS engines when your Viya deployment is located in the MS Azure cloud and you plan to use Azure’s standard Files service for managed shared storage. Remember that there are instructions available in the MAS Programming and Administration guide when you want to use a NFS server instead.

The idea to attach the Python runtime as a Kubernetes volume is as simple as clever. It provides you with the flexibility you need to keeping your Python up-to-date and tailor cut for your needs. Using the Azure Files service on the other hand provides you with a storage solution with zero management overhead.