A few months ago, I was contacted by a colleague in the Iberia region with a technical question, then the next day a SAS Cloud engineer came up to our team with a very similar request but for another customer based in the US. The questions were about architecture and deployment considerations around the utilization of GPU with CAS.
With Viya 4, you will typically have a CAS node pool gathering the CAS nodes that will host the CAS pods (Controller and workers).
Most of the CAS Analytics run on standard CPUs, but how can I configure CAS in Viya 4 if I want to leverage GPU (Graphical Processing Units) acceleration for Deep learning models?
This type of questions are really interesting for me as a technical architect who came from an analytical background. With this in mind, I setup some infrastructure in Google Cloud platform to create a Viya test environment so that I could share my observations in this post.
One of the questions was “If a customer wants two cas node pools (one with GPU and one without GPU), is there a way to schedule work to one vs other without creating two cas servers?”
The answer is no. Remember that a CAS Server is a whole single processing unit. You submit your analytics action to the CAS Controller, and it will decide how to break it down between the CAS workers.
However, what you can do is to have two CAS servers (each Server will have a Controller and one or more workers) inside the same Viya environment. Each CAS server can be either SMP or MPP.
So, an interesting setup for example, could be to have one CAS MPP server for the standard analytics (reporting, statistics, forecasting, etc…) and a CAS SMP server equipped with GPU processors where you could train your Deep learning models.
The diagram below represents the topology of this scenario.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
As you can see on the far right, there is an extra “CASGPU” node pool in addition of the standard CAS node pool and that is where we want the CAS pod of our secondary CAS SMP server to run.
To illustrate this use case, I wrote and tested a specific scenario;
The starting point is that we have already:
Then, we want to add a GPU Node pool, perform a standard deployment Viya with the Deployment operator and finally run through the steps to add, configure and run a secondary CAS SMP server where the CAS pods will be able to consume the GPU device for Deep Learning processing.
So, let’s see what the steps are to meet that goal.
create-cas-server.sh
script to generate the manifest to create a secondary CAS SMP server.Target
section of the cas-manage-workers.yaml
(and potentially the cas-manage-backup.yaml
) file so it uses labelSelector
instead of names
to apply the transformations.
See the SAS documentation for more details on that.
CASDeployment
to run on our GPU-enabled node pools. (We will look at some of the details in the next section of the post.)cas-gpu-patch.yaml
file) as explained in the SAS official documentation and associated README example.PATH
and LIBRARY_PATH
environment variables in the CASDeployment
CRD (We will look at some of the details in a next section of the post.)Kustomization.yaml
file.
As I’d like to keep this post to a reasonable length, I won’t cover the details of each of the step above. 😊
(Most of them are already documented either in the SAS official documentation or in the Google Documentation for most of them).
However, let’s detail a little bit more of some of the steps that are specific to our specific CAS GPU setup.
If you are using Terraform as part of the IaC project tool, the cool thing is that you can easily modify some part of your infrastructure without having to rebuild it completely.
Terraform can identify the delta between what is currently deployed (as reflected in the local terraform state file) and what you are changing in the terraform variables file.
For example, here are the steps to “update” our infrastructure with a new node pool:
|
In this example, our Node pool will always have a single node and use the “n1-highmem-8” instance type with the Nvidia Tesla P100 GPU (16GB of RAM).
You can also notice that we assign a “casgpu” label and a “casgpu” taint to our new CAS GPU node pool.
If you run:
terraform plan -var-file=./gel-vars.tfvars -state=terraform.tfstate |
You should see something like :
The initial Terraform plan creates 42 resources, so the message with “3 to add, 1 to change, 2 to destroy” is a good indication that our Node pool change has been taken into account and will be applied in “delta” mode.
Run the command below to build the new Terraform plan and apply it.
# Build the plan and keep it in a file terraform plan -input=false \ -var-file=./gel-vars.tfvars -out ./addingcasgpupool.plan
|
You should see this line at the end :
kubectl get nodes
command if we have our new GPU node.
The purpose of this change is to make sure that the CASDeployment
instance that corresponds to our secondary CAS Server (shared-casgpu) will only start the CAS pods on the Node(s) with the “casgpu” label.
We start from the default node-affinity.yaml
PatchTransformer file generated by the create-cas-server.sh
script and we make some changes as illustrated below.
As you can see in the screenshots, we removed the preferredDuringScheduling…
nodeAffinity specification and modified the requiredDuringScheduling…
one to associate the "casgpu" workload class. This change ensure that the CAS pod(s) from our secondary CAS Server will always land on a node labeled with the "casgpu" workload class. We also get rid of the "Not In system" required affinity since it is now useless.
Then, in the CASDeployment
CRD of our secondary CAS Server, we need to add a toleration so the instantiated CAS pod(s) can be accepted on a Node with the workload.sas.com/class=casgpu:NoSchedule
taint.
In addition, since Google has automatically tainted our GPU node pool nodes with nvidia.com/gpu=present:NoSchedule
we also want to add a toleration for it.
Once again we use a PatchTransformer to do it :
With these two additional "tolerations" our CAS pod can run a on node with the following taints :
Note : In theory, we should not have to explicitly add a toleration since adding a special nvidia.com/gpu
resource request for the pod that needs to consume the GPU should be enough in GCP – according to the Google documentation. However, since we use the CAS auto-resource mode, the cas-probe could not be started without this additional toleration.
While this step is not officially documented yet, it is required for our CAS pod to be able to utilize the GPU for the CAS processing.
After applying all the previous documented configuration steps, I tested with a program using GPU processing in my Google Cloud environment and it failed with the following error :
After some troubleshooting and investigation, it appeared that, while the NVidia binaries and libraries were available inside the CAS pod on the GPU node, CAS was not able to find them inside the execution and library paths.
But in Kubernetes, you can “inject” environment variables inside the pod using the “env” container specification. It looks like this in the Pod definition :
So, you can write a simple PatchTransformer
to modify the CAS pod template in the CASDeployment
Custom Resources to set the PATH
and LD_LIBRARY_PATH
variables for the NVidia drivers and libraries.
It would look like this :
|
Note that, depending on the Cloud platform and GPU accelerator type you might have to adjust the path used in this specific case.
If our configuration worked, then after regenerating and reapplying the SASDeployment
CRD in the cluster, we should see the CAS pod corresponding to our secondary CAS Server (cas-gpu) start and run on our GPU node, as below :
We can also check if the CAS auto-resources default configuration worked and set the resources request and limit appropriately.
If we run this code, to display the resources requests and limits :
|
Then we get the following results that confirm that each type of CAS instance is using most of the associated node capacity (4 vCPU/26BG of RAM for our 4 cas-default
MPP nodes, and 8vCPU/52GB of RAM for our cas-gpu
SMP node) with the appropriate resource requests and limits.
Finally, after all these configuration steps, we want to make sure that we can run a program performing some Analytics processing that takes advantage of the GPU acceleration.
It should be noted that only specific analytics processing tasks are really taking advantage of GPU devices. Currently code samples in the official SAS Documentation focus on python programs.
However, with the help of my GEL colleagues Beth Ebersole and Nicolas Robert, we managed to write a SAS program (which leverages CAS actions) that build, train, and score a Deep learning model using the node’s GPU.
First we need to open a CAS session on the secondary cas server, with something like:
|
Then, after loading some sample data in CAS , we train the model with the “GPU=TRUE” option and see in the log that the GPU device was identified and used.
Then we also score the model with GPU=TRUE.
Hurray ! The message in the log confirms that CAS has found and is using our GPU device !!!
Note : if you'd like to use this same validation program in your own GPU-enabled, you can download the data and the program from my personal GitHub repository there.
Finally, if (like me) you never really believe things until you see them for real, you might want to monitor the GPU processing at the system level while the model is trained and scored 😊
There are various ways to do it, but in our case we just ran the Nvidia provided program (nvidia-smi) from the CAS GPU pod to check the memory and GPU utilization in real time. On the screenshot below, taken during the model training, we can see that the utilization of our P100 GPU device reaches 72%.
What we’ve seen in this post is basically a demonstration on how you can leverage the GPU processing for CAS Deep Learning in the Cloud.
BUT… it also shows how to play with the node affinities and tolerations to assign different CASDeployment
(within the same Viya deployment) to different CAS node pools (Even without talking about GPU).
It could be an interesting scenario to configure several CAS servers for distinct Business Units and assign, not only different topologies( SMP/MPP) but also different instance types (more or less power, faster or slower storage, etc…) for the different BU’s CAS Servers. As I finish writing this blog, it should be noted that in the last few days (of April 2022), SAS now has the capability for the IML Procedure to leverage GPU processing from a SAS Compute Server session.
Finally, during the experimentation, I tested both techniques for the CAS resources allocation (auto-resources and custom) and both worked well.
OK that’s it for today !
Thanks again to my colleagues who helped me along the way to make all this work: Beth Ebersole, Liping Cai, Frederik Vandenberghe, Nicolas Robert, Uttam Kumar, and David Zanter.
Find more articles from SAS Global Enablement and Learning here.
Thank you Raphael!
I think your last link is pointing to an internal SAS site
Thank you @JuanS_OCS I guess you are talking about the IML documentation link ? I will fix it.
Raphael,
thanks for the code.
I tested this to see if out nvidia was used.
a little result :
NOTE: Using device: CPU. real time 49.34 seconds
NOTE: Using device: GPU 0. real time 9.31 seconds
Dik
Thank you @paterd2 for sharing your results. For the example you used that seems quite a compelling case for using GPU processing in certain uses cases.
--Simon
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.