Workload placement in SAS Viya for the installation engineer – part 1

5 Likes

A critical pre-installation task of the SAS Viya (2021.1 and later) is to ensure that the proper Kubernetes node labelling and tainting are in place for an optimal placement of the SAS Viya pods.

This small series (there will be a second part) assumes that you already have a good understanding of the basic Kubernetes concepts : pods, nodes, labels and covers specifically the workload placement from a Viya LTS 2020.1 deployment perspective : why you should use node label and taints ? and how is the current Viya release configured to leverage it ?

If you are not familiar with Kubernetes Pods, Nodes, labels and taints, then please read the official Kubernetes documentation here and there to understand the principles of workload placement and more specifically what are the ways to influence the Kubernetes scheduler in its decision to run a given pod on a given node.

These pages should give you the required foundation knowledge to understand the rest of the blog.

When do you NOT need workload placement?

In an ideal world, your Kubernetes-native application could be defined as a single Kubernetes Deployment, with anywhere between 1 and 1000 replicas. Therefore, all those replica pods would be identical: they would each need 0.5 cores, and 1 GB of RAM, and their image would be no more than 100 MB in size.

In such a situation, you really would not have to care very much about the size and shape of your nodes, nor where each pod lands, as long as there is enough space for all of them. You could have some nodes with 2 cores, and some nodes with 36, and Kubernetes would pack them all full of pods.

But now imagine that you have 2 distinct deployments, and that one of them needs GPU, while the other does not. While you could make sure that all your nodes have GPUs in them (just in case a pod lands on it) this would be extra costly, and you’d only use half of it. In such a case, instead, you’d want some nodes with GPU, some without, and you’d want to make sure that GPU-requiring pods always land on GPU-having nodes.

The Kubernetes mechanism to achieve such requirement is based on node and pod affinities which is implemented through labels and taints.

Why do we need workload placement for SAS Viya?

The SAS Architecture design documents (often referred as "D30") typically include a section to define the topology and associate the various platform building blocks to different profiles of machines.

As explained above, this kind of machine "specialization" does not fit well with the default Kubernetes mindset, where the whole idea is to keep many aspects of how pods execute on nodes abstracted from the users.

Kubernetes node labeling and tainting is the new way (after the plan.xml and the ansible host groups) to implement the Software topology : associate the various SAS Viya components to specific hosts.

General principle (reminder)

Here is a small recap on how Kubernetes decides Pods placement on the Nodes (which does NOT replace the requirement to read the official Kubernetes documentation pages 😉 )

By default, Kubernetes schedules a pod on a node depending on:

The resource requests specification of the Pods (CPU and memory)
The available resources on the Nodes (based on the resource requests of the Pods currently assigned and running on the node)
Some default labels and taints (such as beta.kubernetes.io/os=linux or node.kubernetes.io/not-ready)

Other than that, it makes no distinction whatsoever between the nodes !

The pods can be freely (almost randomly) scheduled and assigned on any node inside the Kubernetes cluster.

….BUT it may not correspond to YOUR plans.

For example, you might have provisioned some "beefy" nodes with a lot of memory and CPU (maybe GPU chipsets as well) specifically for your CAS server instances and smaller, disposable instances for the microservices. Or some processing might benefit from having large and fast ephemeral storage (Compute, SASWORK), while it would be a waste of money for other processing (Stateful, Stateless).

In such case, you want a better control to associate your pods to specific nodes.

You can change the default behavior of the Kubernetes scheduler with :

Pods Affinities and Tolerations
Node Labels
Node Taints

A certain number of pod affinities and tolerations are already pre-defined in the SAS provided manifests- this does not mean you are not allowed to add more specialized affinities and tolerations on top of that.

But the Node labels and Node taints are something that you (as the SAS Installation engineer or the customer) needs to implement as part of the pre-installation tasks.

5 workload classes for SAS Viya

When you generate the SAS Viya manifest file (site.yaml) with Kustomize, the SAS Viya pods affinity and tolerations are already there and use a SAS Viya specific breakdown with the SAS proprietary label : workload.sas.com/class.

The 5 defined “workload classes” are documented in the Official Deployment guide :

stateless,
stateful,
cas,
compute
and connect.

With this "preset" definition, each of the SAS Viya workload classes can be managed differently.

If you are looking for examples you can search for the "labels" specification of the various resources in the site.yaml manifest (or see that in the instantiated pod instances using a command like kubectl get pod <podname> -o yaml or the “pen” icon in Kubernetes Lens).

As you can see below, the sas-cachelocator pod is categorized in the stateful workload class.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Whereas the sas-data-flows pod is in the stateless workload class category.

When the pods have these labels they also have the corresponding NodeAffinity and tolerations specifications.

Where does the SAS Viya Pods go ?

So... what is the effect of these “workload classes” labels in our SAS Viya Pod’s definitions ?

Well... it depends on how you labelled and tainted your Kubernetes nodes !

It is the combination of the pods and nodes specifications that triggers the pod placement.

If there is no Node labels or taints at all, then, in theory the "workload classes" should not have any effect.

If you want to enable the workload placement with the SAS Workload classes you need to label and taint your Nodes (using the kubectl commands provided in the SAS Documentation).

Note that in the current design, each node CAN ONLY HAVE NONE OR ONE SINGLE LABEL out of the 5 SAS-specific ones.

Remember :

Node label attract the pods with a matching label.
Node taint repel any pods (unless they have the corresponding toleration)

The table below summarizes where the pods prefers to go (based on the Node labels) and where they can go when Node taints are used.

Here is another example, let's try to understand where the sas-authorization pod would be scheduled :

- The sas-authorization pod has the stateless workload class label:
- it prefers to be scheduled on a Node with the stateless label:

Note that if it is not possible, the "second" preferred option for this pod is to go on any Node not having the compute, cas, stateful or connect workload class label.

- But it tolerates the stateful and stateless taints :

So, a way to read the first two lines of the table is to say : Stateless and stateful pods prefer to be scheduled on the Nodes with the corresponding label but if it is not possible they can also be scheduled on “stateless” or “stateful” Nodes.

Note that those specific rules defined in the SAS Viya pods are subject to change in future versions.

What is the recommendation ?

Well...we have some flexibility.

The minimum requirement is to use at least labels for the CAS and Compute nodes (and we'll see in the second part of the series that there are some good reasons for that - spoiler alert ! 🙂 )
If you really want to split the SAS Viya components on different specialized nodes or node pools to have differentiated management and attributes for the nodes, then you can label and taint all your SAS Viya nodes.

However, tainting all the nodes of your cluster can be introduce some unintended or unwanted outcomes. These outcomes can lead to SAS Viya becoming unavailable to the users or impacting the performance aspects of SAS Viya.

Over-specialization kills flexibility

Tainting can add overly restrictive constraints on the Kubernetes scheduler. So depending on other factors and parameters (for example replicas or draining operations) you could end up in a “dead lock” situation with pending pods that cannot be scheduled anywhere. One of the key benefits of Kubernetes is to be flexible and highly available by allowing the pods to run or be rescheduled on different nodes.

Being too directive on the workload placement could destroy this benefit.

While we want to be able to leverage specialized hardware in support of better performance, "micro-managing" the pods leads you to replicate a BareOS-minded topology on top of Kubernetes. This is a Kubernetes anti-pattern, and it can end up being detrimental to other aspects (costs, management overhead, uptime).

Ideally, you will just implement what corresponds to the workload placement strategy that has been discussed, understood by the customer, precisely defined and agreed during the Architecture phase and correspond to the customer needs, requirements and budget.

But if you don't have all that, a good start is to label all the nodes to implement pods preferences but only taint the CAS nodes (to prevent other pods to go there and have truly dedicated "CAS" nodes) and Compute nodes.

The key take-away

The most important information you need to retain from this Workload placement strategy is that you do not have to get it 100% right on the first day.

In Viya 3 and SAS 9.x, topology was implemented on install day, and could only be modified later with a lot of work and significant downtime. By contrast, in the new SAS Viya (2020.1 and later), as long as you start with a minimal set of labels, you can take your time and slowly move your components around, with very little downtime, until you reach the ideal topology for your particular needs.

OK That’s it for the first part of the series !

For questions and comments please use a comment below.

In the next part of this series, we will go a little further in details and see how the labels are used during the SAS Viya deployment and discuss a few additional workload placement considerations for a deployment in AKS (Azure Kubernetes Services).

Find more articles from SAS Global Enablement and Learning here.

SAS Communities Library