Determine how many SAS Viya analytics pods can run on a Kubernetes node – part 3

3 Likes

This wraps up a three-part series where we investigate the number of SAS analytics pods that can run in the Kubernetes cluster.

For this post, we will outline the process and considerations for determining the number of SAS Compute Server instances can run on a Kubernetes node. In particular, we will focus on specific environmental configuration and infrastructure resources to get an idea of the number of pods that are possible, while acknowledging the significant caveat that the SAS Programming Runtime Environment is used for jobs with varying complexity and data volumes.

[ Part 1 | Part 2 ]

SAS Workload Management

As we've already discussed, SAS Workload Management brings additional powerful capabilities to the SAS Viya platform for scheduling dynamically-launched processing pods of the SAS Programming Runtime Environment as instantiated by SAS Compute, SAS Batch, and SAS Connect Servers as well as other pods running Python or R as started by the Batch service.

We'll refer to all of that collectively as "SAS Compute" here - or even more simply as "jobs" that the SAS Workload Orchestrator will manage (and to be clear, are not to be confused with Kubernetes "jobs" which are something else).

If you've worked with SAS Workload Management before (and its SAS 9 predecessor, SAS Grid Manager), then you're already familiar with many aspects of its operation and configuration. However, in the last few months, a couple of major changes have come down the pike that we should review first.

Timeline

Here’s a brief and abbreviated history of how workload management in SAS Viya has evolved recently.

∵ Up to and including SAS Viya release 2023.07 – (notable items in bold)

SAS Workload Management was an optional offering that could be added to a SAS Viya deployment.
Without SAS Workload Management, then the SAS Launcher service relied on SAS_LAUNCHER_USER_PROCESS_LIMIT (and one for SUPER_USER) to restrict the number of SAS Compute processes that a single user could launch. This helps to prevent inadvertent and/or malicious denial-of-service where one user might saturate the system with too many jobs.
With SAS Workload Management active in the environment, then the Launcher’s user process limit is ignored because SAS Workload Orchestrator offers numerous fine-grained controls on job placement and execution.
With SAS Workload Management active in the environment, then SAS Workload Orchestrator was configured to allow a maximum number of jobs on a host that is equal to the number of CPU cores. This value can be changed at any time to suit the purposes of the environment to some multiple of CPU core count or any integer value.

∵ Beginning with SAS Viya release 2023.08 – (notable items in bold)

The SAS Workload Management offering is included with SAS Viya and is activated by default.
By default, SAS Workload Orchestrator is configured to allow a maximum number of jobs on a host that is equal to the number of CPU cores. This value can be changed at any time to suit the purposes of the environment to some multiple of CPU core count or any integer value.
The Launcher’s user process limit is no longer evaluated unless SAS Workload Management is deactivated.

∵ Beginning with SAS Viya release 2023.11 – (notable items in bold)

The SAS Workload Management offering is still included with SAS Viya and is still activated by default.
By default, SAS Workload Orchestrator is configured to allow a maximum of 250 jobs on a host. This value can be changed at any time to suit the purposes of the environment to some multiple of CPU core count or any integer value.
The Launcher’s user process limit is still not evaluated unless SAS Workload Management is deactivated.

Regardless of what’s going on with SAS Workload Management, there are other aspects of SAS Viya which play a role control the concurrency of jobs beyond the scope of this post to be considered.

From just one SAS Compute instance per CPU to 250 per node?

Does that mean that Viya should be able to run with 250 SAS Compute jobs on a host node? Probably not, but only because it’s unlikely that Kubernetes and the infrastructure hosting the environment has been set up with enough resource capacity to do so.

SAS Workload Management, along with the rest of Viya, is still backed by Kubernetes and subject to many of the same considerations and restrictions. And Kubernetes itself is still backed by the infrastructure it’s running on which is to say there are physical limitations of hardware that matter, too.

By setting 250 as the new default maximum for the number of jobs on a host, this effectively removes one of the throttling factors that SAS Workload Orchestrator uses to manage jobs. That is, this gets the default configuration of SAS Workload Management “out of the way” such that Kubernetes will be the primary throttling agent. This change was made so that if there’s a large number of SAS Compute jobs attempting to run in a new Viya deployment, then they’re more likely achieve utilization of the "compute" node pool that would be expected per typical Kubernetes workload planning.

Of course, the idea is that this throttling factor and other SAS Workload Orchestrator configurations can be changed at any time to suit your needs. Remember in the previous post we touched on the variability of work a SAS Compute process might be asked to perform. So, you might even choose to revert to the old "1 job per CPU" value (or a similar CPU-based multiple) if planning for large-effort jobs.

Infrastructure constraints

You’ll often hear that Kubernetes has a default maximum of 110 pods allowed to run per node. While true in some regards, it’s deceptive to assume because the value is configurable and also is subject to external influence, like physical attributes of the hosting infrastructure.

In a cloud provider like Amazon Web Services, the number of network interfaces and associated IP addresses is often a limiting factor that determines the maximum number of pods a node can host. This can be increased by choosing to run larger instance types as well as installing optional networking software, like a Container Networking Interface, to the Kubernetes cluster.

For example, we can query my test environment to see the maximum pods allowed and how it varies from node to node of my Kubernetes cluster:

$ kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU:.status.capacity.cpu,MEMORY:.status.capacity.memory,MAX_PODS:.status.capacity.pods,TaintKey:.spec.taints[*].key,TaintValue:.spec.taints[*].value
 
NAME                              CPU   MEMORY        MAX_PODS   TaintKey                 TaintValue
ip-192-168-126-215.ec2.internal   8     32386544Ki    58                            
ip-192-168-38-70.ec2.internal     8     32042484Ki    58                            
ip-192-168-7-135.ec2.internal     8     32042484Ki    58         workload.sas.com/class   cas
ip-192-168-9-145.ec2.internal     8     32042484Ki    58         workload.sas.com/class   cas
ip-192-168-98-179.ec2.internal    8     32386544Ki    58         workload.sas.com/class   cas
ip-192-168-106-88.ec2.internal    8     32386544Ki    58         workload.sas.com/class   cas
ip-192-168-14-66.ec2.internal     32    130389628Ki   234        workload.sas.com/class   compute
ip-192-168-29-85.ec2.internal     32    130389628Ki   234        workload.sas.com/class   connect
ip-192-168-102-24.ec2.internal    16    65021468Ki    110        workload.sas.com/class   stateful
ip-192-168-72-173.ec2.internal    16    64333348Ki    110        workload.sas.com/class   stateless

Notice the nodes that are tainted for “compute” and “connect” each have a MAX_PODS value of 234. That number is less than the 250 maximum jobs allowed per node value configured by default in SAS Workload Orchestrator and is a hard limit that cannot be exceeded.

To be clear, that MAX_PODS value can be edited as part of the kubelet settings (especially in your own local deployment of Kubernetes), but when it comes to cloud-provided Kubernetes, it’s set for you automatically to communicate the underlying infrastructure limits. Changing that alone won’t magically add more resources like CPU cores or addressable IP space, for example.

SAS Workload Orchestrator constraints

We began this post by mentioning the maximum jobs allowed per node in SAS Workload Orchestrator. But that’s just one parameter out of many that are important to consider. We can also set maximum jobs by queue, by user or group, as well as maximum jobs total for the cluster.

And we can go much farther than that with a wealth of scheduling options. We can close hosts (i.e., take them out of service temporarily, analogous to cordoning a node in Kubernetes). Thresholds can be set so that once a certain amount of host resource is being consumed, then new jobs are kept in the queue until enough of that resource is available. And that’s not all, we can combine the various factors and establish a comparison order to prioritize their evaluation for job placement.

One question you might ask is, “Why would I want to set an arbitrary limit on the number of compute processes per node that’s not based on CPU or RAM usage?”

A possible answer to that lies in fact that data-intensive processing has often been a bottleneck for SAS Compute. SAS can only process data as fast as it can be delivered to the CPU. And so, we often recommend starting conversations with infrastructure providers to ensure SAS compute has access to 125 MB/sec/core of i/o throughput. Kubernetes doesn’t monitor this particular metric in terms of managing workload and many of the lower-tier offerings from cloud-providers can struggle to meet this goal. Therefore, one way to ensure that SAS Compute is running at maximum efficiency might be to set a max jobs per host limit that restricts the number of SAS Compute so that those running can fully utilize the CPU as far as the i/o throughput rate allows.

SAS Compute processes can vary widely in terms of the tasks they perform and hence the resources they consume. Be careful with arbitrary, one-size-fits-all constraints so that you don’t overfit SAS Workload Orchestrator for some jobs at the expense of others.

How many SAS Compute jobs can run on a node?

As shown in this post, to answer this question requires evaluating the configuration and associated resources of your environment. Simply put, whatever factor yields the smallest number is likely the answer. This might require some investigation, inference, and experimentation.

We've talked about several concepts so far…

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Host closed? Or node cordoned?

Is the host closed in SAS Workload Orchestrator (or cordoned in Kubernetes)? Then zero SAS Compute jobs can run there.

Pod resources requested initially

Next look at the combined CPU and RAM requests (if defined) for the containers inside the SAS Compute Server pod. Divide the total available resource by its associated composite request value in the pod and round the result down to the nearest integer to get an idea of how many can run on the host. For example, if the containers in a SAS Compute pod request a composite maximum of 750 millicores of CPU, then a Kubernetes node labeled for "compute" workload with 16 CPU could possibly run up to 21 (= ⌊16 cores ÷ 0.75⌋) SAS Compute jobs (not including CPU overhead consumed by OS, Kubernetes control plane, or other processes). Repeat this calculation for the compute pod's RAM request (and limit) compared to what's available on the node as well. Keep in mind that container requests are simply initial guesses. Actual resource usage could be less or much, much more (up to the optionally defined limit).

Job maximums

Besides the maximum number of jobs defined per host from all queues, check out the other maximum job values that can be defined in SAS Workload Orchestrator as attributes of the queues themselves:

Maximum jobs for this queue
Maximum jobs per user in this queue
Maximum jobs per host in this queue

By default, they're not defined. However, if values are provided, then those numbers are somewhat arbitrary, informed only by the Viya administrator's own understanding of workload attributes. So, they might be a limiting factor, or not.

Other SAS attributes

Consider other aspects of SAS Workload Management like schedule or suspend thresholds based on attributes like pgRate, swap, temp, runQueue15s, and so on. It's possible to configure SAS Workload Orchestrator to take these into consideration when it submits jobs to run in the cluster. Plus, general operation of SAS Viya as well as configuration of client components can affect the number of jobs that are allowed to run as well.

Max_Pods

Investigate the MAX_PODS for the Kubernetes nodes assigned as part of the "compute" workload class and their associated infrastructure (i.e., instance type, CNI, etc.). This is a hard limit but still could be (and likely should be) higher than the effective limit contributed by other factors. Remember that the accepted Kubernetes default is 110 pods per node, however for cloud-provided Kubernetes cluster, this value is typically set based on physical resource capacity (like instance type size and the number of network connections it can handle).

While we’ve touched briefly on a wide range of topics and considerations that have notable bearing on the raw number of possible SAS Compute (and other dynamically-launched processing pods of the SAS Programming Runtime Environment) that can run in SAS Viya, we have not looked at an exhaustive list of all factors. Having an awareness of how Kubernetes and the SAS Viya platform operate along with familiarity with the specifics of the workload is very helpful in tuning the environment to suit the needs of your users.

Here's an attempt to summarize all of these concepts into one pithy statement:

In short, look across the various constraints defined for your environment to determine which has the final say in limiting the number of SAS analytic jobs that can run on a node. Even then, we're just making a best guess from an initial set of assumptions and configuration. The actual execution of jobs will likely vary from the theoretical limit depending on the resources consumed as work progresses. If a node succumbs to resource pressure, then Kubernetes will begin evicting pods based on their quality of service.

Determining the possible number of Compute jobs is a useful metric calculate to help ensure things are running smoothly and efficiently in your environment. However, it's hard to nail down a perfect number because Kubernetes is designed to be fluid in managing node resources. SAS Workload Management provides extra controls to best define the kind of jobs that SAS Compute will demonstrate, but it's still subject to Kubernetes and the infrastructure's capabilities to get the job done.

Coda

Wrapping up this series, we've seen that SAS Viya offers a variety of analytic engines which interact with Kubernetes and the underlying infrastructure in different ways.

We started off looking at CAS and learned that its default configuration will allow just one CAS pod (SMP or MPP, controller or worker, but not including the personal CAS server) per Kubernetes node because each pod is defined to request a majority of the node's CPU and RAM. For the high-performance, in-memory analytics work that CAS performs, this is usually the recommended approach to ensure efficient delivery of results.

For the pods that comprise the various runtime instantiations of the SAS Programming Runtime Environment, we found that SAS Workload Management has a huge part to play in providing flexibility to define how SAS Compute jobs are set loose to run in the environment. Out of the box, the default is to allow almost "unlimited" jobs (configured as Maximum Jobs Allowed per Node = 250) so that new deployments aren't artificially constrained and inadvertently underutilize the system.

There's a wide range in variability in terms of what a given SAS Compute job might do, from tiny queries or calculations completed in a milliseconds to large volume and/or computationally intense tasks that might run for hours. Understanding the expected workload in combination with the resource capacity the infrastructure provides is key to ensuring the system performs at optimum levels for efficiency and cost.

H/T

I'd like to tip my hat in gratitude with warmest regards to several people who helped explore this topic with me: David Stern, Scott McCauley, Edoardo Riva, Raphaël Poumarede, Joe Hatcher, Craig Rubendall, Seth Heno, and Doug Haigh. Any mistakes are my own, not theirs. 😉

Find more articles from SAS Global Enablement and Learning here.