The SAS Workload Management Approach to Autoscaling

2 Likes

SAS® Workload Management is a product add-on to SAS Viya. In essence, this means that any of the SAS Viya offerings can be improved with the addition of SAS Workload Management. With SAS Workload Management in place, then additional controls are provided to manage the lifecycle of SAS Compute Servers for more efficient operation in the Kubernetes environment. One aspect of that is scalability. SAS Workload Management provides scalability options beyond what Kubernetes alone can provide. So let's take a look at how Kubernetes handles automatically scaling an environment and what SAS Workload Management does to extend that capability.

Automatic scalability in Kubernetes

There are two primary approaches to scalability built into Kubernetes that we want to look at here.

Horizontal Pod Autoscaler

Kubernetes provides an API resource known as the horizontal pod autoscaler (HPA). As the name suggests, it can scale the workload (i.e., change the number of pods) to match demand based on average CPU utilization, average memory utilization, or some other custom metric. So if you've got one pod running hot with a lot of activity, the HPA can notice that and schedule additional copies of that pod to run. It can also scale down when the activity decreases (to a configured minimum).

The HPA is useful for scaling the workload of pods implemented as Deployments, StatefulSets, or similar resources. It isn't available for k8s workloads that cannot scale, like DaemonSets.

Cluster Autoscaler

If we run more and more pods, then eventually we'll probably need additional hardware, too. This is where a cluster autoscaler (CA) utility can come in handy. The job of the CA is to monitor the k8s cluster and bring additional nodes online when needed, within configured limits. So, for example, if k8s is unable to schedule new pods to run due to insufficient resources on the existing nodes, then the CA can request the hardware provider to spin up additional nodes. Once those nodes are online and the required containers have been pulled, then k8s can schedule pods to run there.

The CA is helpful in many use cases, but it's not a native k8s offering. Instead, the cloud providers (or on-prem solutions like OpenShift) have their own CA implementations with their own standards and quirks. Generally, it's the k8s admin's responsibility to manage the CA and its operations.

WLM components

SAS Viya relies on several services in its infrastructure to provide access to the SAS programming runtime environment which includes the SAS Compute Server (and related components like SAS Connect Server and SAS Batch Server).

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

For this discussion about scalability with SAS Workload Management (WLM), we're mostly interested in three pieces in particular:

SAS Workload Orchestrator Manager - deployed as a k8s StatefulSet
SAS Workload Orchestrator Server - deployed as a k8s DaemonSet
SAS Compute Server (or Connect or Batch) - deployed as k8s Jobs

Scalability of these WLM components varies based on their technology.

Kubernetes scalability applied to WLM

First of all, let's look at how k8s built-in scalability functionality applies to the WLM technology.

The k8s HPA really only applies the SAS Workload Orchestrator Manager (WOM) component since it's deployed as a StatefulSet (which runs in the SAS stateful workload class node pool). However, the WOM has a very lightweight job to do in tracking WLM activity. When two WOM are deployed, they actually run as an active-passive cluster simply to provide high availability of the service. Just one handles all activity and if it fails for some reason, the other takes over. The point here is that we don't need to run multiple instances of the WOM for scalability reasons since only one actively does the job anyway.

The SAS Workload Orchestrator Server (WOS) component is deployed as a DaemonSet. The k8s HPA doesn't apply to DaemonSet workloads. The point of a DaemonSet is to ensure that all (or some) nodes can run a copy of a pod. Per its definition, the WOS DaemonSet is automatically deployed to any node labeled for SAS Compute workload (i.e., workload.sas.com/class=compute) which means the scalability of WOS happens automatically with the addition (or removal) of nodes labeled for compute. So if the CA determines there's a need to increase the size of the compute node pool, then this can add nodes where the WOS DaemonSet will run, providing additional candidate hosts for the SAS programming runtime.

The SAS Compute Server (and Connect and Batch) doesn't rely directly on either k8s auto-scaling functionality because they are instantiated as k8s Jobs which are placed on k8s nodes at the direction of WLM.

WLM scalability

SAS Workload Management exists primarily to ensure efficient utilization of resources in support of the SAS programming runtime as provided by the SAS Compute Server (and Connect and Batch). Through the use of queues, priority, and preemption provided by WLM, SAS Viya users can manage their SAS programming workloads in a variety of ways over a range of criteria.

This ability is beyond what k8s can provide because k8s cannot see or understand the actual tasks that are in play for SAS. It doesn't know, for example, that one SAS job might run quickly with minimal overhead as compared to another much heavier task which could spawn numerous parallel jobs. With configuration SAS Viya can automatically determine workload placement based on client (like SAS Studio Engineer or the SAS Viya command-line interface utility), user identity, and resource usage in the environment as well as ad-hoc through additional programming. That is, we can tell WLM how to understand the workload in ways that k8s cannot.

In cases where the compute node pool (i.e., hosts labeled as workload.sas.com/class=compute) in the k8s cluster has been scaled out (either manually by hand or automatically by the cluster autoscaler), then k8s will ensure that the SAS Workload Orchestrator Server DaemonSet is in place on those new nodes. After k8s pulls down the programming runtime containers as well, then the new nodes are ready for business - there's no additional deployment, configuration, or registration of the software components. This is a big improvement in automation simplicity when compared to adding hosts to a SAS 9.4 Grid Manager solution because the nature of bare OS software deployment lacks the automation infrastructure that k8s readily provides.

WLM future

SAS Workload Management is just getting started. There are plans to significantly increase the range of its functionality as well as its integration with Kubernetes. The idea is that WLM will eventually be able to manage workloads for other SAS runtimes (like SAS Cloud Analytic Services) as well. And further that WLM should one day be able to more explicitly direct k8s on how to scale the cluster environment for specific needs.

More information

There’s a lot to learn and understand about SAS Workload Management. Check out these other posts from the SAS Global Enablement & Learning team to discover more.

Find more articles from SAS Global Enablement and Learning here.