BookmarkSubscribeRSS Feed

Kubernetes autoscaling for SAS Compute workloads might be just a click away

Started 2 weeks ago by
Modified a week ago by
Views 246

When you logon on to SAS Viya's Environment Manager app with administrative privileges, then you can configure SAS Workload Orchestrator (SWO). Out of the box, SWO includes one Host Type named "default". And by default, that Host Type manages hosts in Kubernetes that are labeled "workload.sas.com/class=compute". One of the attributes defined by the Host Type is a checkbox to "Enable autoscaling".

 

01_RC_SWO-enable-autoscaling.png

 

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

So, if we check that box, then we're done aren't we? Autoscaling is active and ready to go — all set to automatically scale up the number of nodes when there are too many Compute jobs for the current hosts — and to scale back down when that workload subsides. Right?

 

Like a lot of things in life, the answer is, "Yes! Well, maybe. Okay, it depends." 😊

 

Let's talk about what really happens when you click the box to enable autoscaling. And how we can ensure we're getting the desired outcome.

 

 

What is autoscaling?

 

Autoscaling as configured by SAS Workload Orchestrator refers to automatically scaling the number of nodes in the Kubernetes cluster to intelligently meet the demands of the current workload. This kind of elastic scalability — that supports scale-from‐ and scale-to-zero nodes — provides a system responsive to different levels of activity while also reducing operation costs for unused resources.

 

This automation is typically handled by another piece of software: the Kubernetes Autoscaler project — in particular, the component known as the Cluster Autoscaler. When implemented in Kubernetes, the Cluster Autoscaler monitors the environment keeping a keen eye out for a new pod that isn't schedulable to any node. If the reason is that the other nodes are all too busy to run the new pod, then Cluster Autoscaler will trigger the creation of a new node. When the new node is ready, then that new pod is scheduled to run there.

 

The Cluster Autoscaler will also notice if a node is idle for a while. When that happens, then it triggers the termination of that node, which helps save on infrastructure costs.

 

To be clear, the concepts of horizontal pod autoscaling (the number of pods) and vertical pod autoscaling (the size of the pods) are different technologies. Their implementations might have an impact on cluster autoscaling (the number of nodes), though.

 

Without autoscaling, then both Kubernetes and SWO will simply manage the excess workload, keeping it in check and in waiting until other jobs finish which frees up resources enough to run jobs waiting in the queue.

 

 

How do I get the Cluster Autoscaler?

 

If you're lucky, it's already set up for you. Microsoft Azure, Google GKE, and RedHat OpenShift include the Cluster Autoscaler with their Kubernetes deployments. You just need to ensure it's enabled for use on the node pools of your cluster.

 

Otherwise, if you're running in Amazon EKS or using upstream, open-source Kubernetes, then you'll need to install the Cluster Autoscaler for yourself. For Amazon, SAS can help with that, though. The viya4-iac-aws project can optionally perform the Amazon IAM requirements that are needed — see "autoscaling_enabled" in the General config. And the viya4-deployment project can optionally confirm IAM is configured and follow through with deploying the Cluster Autoscaler software — refer to the Cluster Autoscaler config.

 

 

What is else required for autoscaling?

 

In addition to SAS Workload Orchestrator and the Kubernetes Cluster Autoscaler, what else is needed for autoscaling? Well, we need the infrastructure set up so that the Cluster Autoscaler knows what to trigger and where.

 

First of all, we need Node Pools (a.k.a. Node Groups). A Node Pool essentially specifies the following:

 

  • Instance Type (OS, CPU, RAM, disk, and other physical attributes)
  • Minimum number of instances that must run
  • Maximum number of instances allowed to run
  • Desired number of instances to run now

 

That "desired" number is important. Its value can be changed on the fly, meaning we can use it to set the number of nodes in the pool to run. But that's a nifty trick — because as it turns out, each cloud provider has their own way of managing instance lifecycles — so they each provide their own solution, which gets its own name:

 

  • in AWS, it's the Auto Scaling Group
  • in Azure, it's called Virtual Machine Scale Sets (VMSS)
  • in GCP, it's a Managed Instance Group
  • in OpenShift, call it Machine Set

 

In other words, the node pool is where we can specify how many nodes to run, but it's up to the cloud provider's Managed Group (my generic name) to handle the actual work of bringing new nodes online. So, that's required, too.

 

And you might remember that when we're identifying requirements and designing the deployment of the SAS Viya platform, the SAS documentation recommends the use of workload classes to divvy up pods to best match the underlying hardware. Typically, we see at least four workload classes — one each defined for Viya's Stateful, Stateless, CAS, and Compute services. SWO only manages the Compute workload which includes SAS Batch, SAS Compute, and SAS Connect implementations of the SAS Programming Runtime Environment.

 

One more thing, SWO needs to know what's happening in the Kubernetes cluster so it can act appropriately when it comes to autoscaling. So, make sure to include the $deploy/sas-bases/overlays/sas-workload-orchestrator overlay in your kustomization.yaml so it gets the clusterrole and -binding it needs. If you use the viya4-deployment project, this is included automatically.

 

I know that, all together, it sounds like a lot. But really, that's pretty much par for the course of any normal SAS Viya deployment. There's a good chance you'll find each of those items has been addressed for a well-planned and -implemented site.

 

 

Jobs, not pods

 

As a reminder, SWO manages compute "jobs" and "hosts" whereas Kubernetes manages "pods" and "nodes". Each compute job runs as a pod in Kubernetes, but the distinction is important because SWO provides a layer of abstraction that is managed by Viya administrators, not Kubernetes administrators.

 

In Kubernetes, the number of pods allowed to run on a node is determined by total resource requests and limits of the pods' containers (amongst myriad other factors). When too much CPU or RAM is requested of a Kubernetes node, then it stops accepting new pods to run.

 

With SWO, the number of jobs allowed to run on a host is defined in SAS Environment Manager. It can be arbitrarily set to a number like "5 jobs at a time per host" or calculated based on resource consumption like "1 job per CPU". There are several criteria you can choose from.

 

You might say that Kubernetes sets the hard limits and SWO sets the soft limits. The point is, Viya administrators who are well informed about the compute jobs activities can use the SWO to better distribute workload effectively as it runs in the Kubernetes cluster.

 

 

How does autoscaling happen?

 

For SWO, autoscaling can occur when it determines that it has a new job to run, but no host available to run it (as per its configuration of queues, resources, host types, etc.). When autoscaling is enabled, then the SWO will create a Scaling Pod and submit it to Kubernetes. The Scaling Pod is designed to request more CPU and RAM than any of the current nodes can handle. The Scaling Pod, having nowhere to run, gets noticed by the Cluster Autoscaler which then triggers the creation of a new node. When the new node is ready, SWO schedules the new job to run there and deletes the Scaling Pod as its purpose is fulfilled.


 

To tie this together then, when enabled, SWO can trigger the Cluster Autoscaler to modify the desired number of nodes in the pool (within the maximum allowed) to cue the cloud provider's Managed Group (or whatever they call it) to take the physical steps that get the right number of virtual machines running.

 

 

What prevents runaway resource consumption?

 

Are you imagining the possibility that an uncontrolled number of jobs gets submitted to SAS Workload Orchestrator which kicks off a cascade event where a scary number of new Kubernetes nodes are brought online and start costing you a lot of money in a little time?

 

Well, don't worry. First of all, you can set the maximum allowed nodes attribute of the Node Pool definitions. And besides the "enable autoscaling" checkbox offered per Host Type, SWO also provides some helpful configuration to keep autoscaling in check.

 

For general configuration:  

 

SWO Host Type

Property

Default value Description
Autoscaler request delay 0 sec The number of seconds that the host manager delays between receiving a scaling request from the scheduler and creating a scaling pod.
Autoscaler delay after new host 300 sec The number of seconds that the host manager waits before honoring a new scaling request for the same host type
Autoscaler wait timeout for new host 600 sec The number of seconds that the host manager waits between the creation of a scaling pod and a response from the new host before canceling the scaling request

 

And for time-based queue configuration:  

 

SWO Host Type Property Default value Description
Autoscaler minimum pending time for jobs 0 sec The number of seconds a job must be in the Pending state before being considered for a scaling request
Autoscaler minimum pending jobs 0 jobs The number of jobs that must exceed the Autoscaler’s Minimum Pending Time before a scaling request is made

 

Furthermore, after a scale-up event, eventually the workload drops back down to normal levels. Instead of distributing new jobs across all available hosts — which might keep more Kubernetes nodes running than are actually needed — SWO will purposely schedule new jobs to run on hosts that are running other jobs. That way, when there is excess capacity, then the Cluster Autoscaler will see some nodes are idle and trigger their scale down.

 

 

When is autoscaling best used?

 

Technically, the SWO's use of autoscaling will work for any SAS Programming Runtime Environment workloads, including SAS Compute, SAS Connect, and SAS Batch servers.

 

But it helps to understand that scaling up the number of Kubernetes nodes is not an instant event. It might take minutes for a new node to provision and come online assuming the cloud provider has available resources to do so. Once the new node has joined the Kubernetes cluster, then the appropriate SPRE containers need to be downloaded as well. So, for interactive use, like SAS Studio users relying on SAS Compute servers, autoscaling a new node might not happen fast enough before Studio times out after waiting a bit (usually ~60sec). The user might need to be patient and try again once all components reach full readiness.

 

And so practically, autoscaling is best for SAS Batch in some ways. The nature of batch jobs is that they're usually "fire-and-forget" and often run off-peak hours. So they can tolerate any new node startup delays.

 

We can handle the differences in usage and capability by defining Host Types in SAS Workload Orchestrator per their intended use-cases. Instead of just the "default" Host Type, create others that correspond to associated node groups, binding them together by use of custom labels beyond the required "workload.sas.com/class=compute". And for interactive jobs, instead of relying on automated monitoring that's based on constrained resources, perhaps consider a different approach to ensure the "desired" number of nodes to accommodate expected workload ramps up for business hours and scales back down during periods where low-activity is expected.

 

For example, I can use the AWS CLI to specify the minimum size of my compute node pool:

 

$ aws eks update-nodegroup-config --scaling-config minSize=5 --nodegroup-name compute-123456c --cluster-name viya-on-eks

 

And it confirms with a message like:

 

{
    "update": {
        "id": "f7666f72-b4a3-34a2-83a1-ab030fc4d607",
        "status": "InProgress",
        "type": "ConfigUpdate",
        "params": [
            {
                "type": "MinSize",
                "value": "5"
            }
        ],
        "createdAt": "2025-11-17T16:46:03.882000-05:00",
        "errors": []
    }
}

 

Note: Amazon recommends against changing the desired size directly when the Cluster Autoscaler is enabled in the environment. So, we set the minimum size instead. Of course, your cloud provider's CLI will use its own syntax and return different results.

 

Implement something like that in your environment with a cron scheduler (or similar approach) and you can ensure enough nodes are up and running for interactive Compute workloads during normal business hours.

 

 

Coda

 

SAS Workload Management puts incredibly powerful controls over SAS jobs and their processing in the hands of Viya administrators and users — the people who are best informed about the kind of workload that needs managing. And yet, it still works within the Kubernetes framework as a well-behaved citizen, too.

 

For a well-planned and -implemented site, enabling autoscaling in SWO really might be as easy as checking that box in SAS Environment Manager. That activates a bevy of features and functionality that makes your Viya platform not just scalable, but elastic to adjust to the needs of SAS Compute workloads on the fly, saving time and money.

 

For more information about this topic and other aspects of SAS Workload Orchestrator in action, visit learn.sas.com to view the Architecture and Administration for SAS® Workload Management on SAS® Viya® workshop.

 

 

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
a week ago
Updated by:

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started