Determine how many SAS Viya analytics pods can run on a Kubernetes node – part 1

2 Likes

SAS Viya is all about delivering powerful analytics capabilities while Kubernetes monitors and manages the environment so that Viya pods are scheduled to run on the optimal nodes and ensures that host machines don't become over-saturated with work.

The SAS Viya platform is designed to work in harmony with Kubernetes to deliver analytics in a robust and scalable way. And in particular, Viya employs a variety of analytic engines to accomplish this goal. Each analytic engine has a unique usage pattern for resources and we can customize the deployment to optimize for best results. For this post, we will focus on the SAS Cloud Analytic Services and then later look at the SAS Programming Runtime Environment, but note that similar concepts can also be explored for the SAS Micro Analytic Service, SAS Event Stream Processing, and so on.

Have you ever wondered how many pods of a runtime engine like CAS or Compute can run on a Kubernetes node? How is that determined? What changes can you make? What do we need to consider when trying to trying to match the workload to the available resources?

This is a three-part series where we take a swing at answering those questions. In this first post, we begin with looking at CAS and the unique considerations that a high-performance, in-memory analytics engine requires. The next post will expand on new capabilities introduced by SAS Workload Management and the impact that has on SAS Compute. Then the last post will wrap up with evaluating the Kubernetes nodes to determine how many pods for runtime engines they can actually support running.

[ Part 2 | Part 3 ]

Containers, briefly

A container is intended to provide a lightweight, standalone, executable image of software. It starts with a base layer providing a basic operating system and then additional software is layered on as the container is built so that everything needed is there to run.

rc_1_cas-container-layers.png.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Containers offer a very flexible approach to running software with improved management and operation. However, the number of containers and their orchestration can become burdensome to maintain. That's where Kubernetes comes in - to provide a powerful framework for managing containerized workloads.

SAS Viya software is packaged into many container images which are downloaded to the Kubernetes cluster when needed for installation and execution.

Pods, briefly

The smallest unit of work in a Kubernetes cluster is a pod. A pod consists of one container or more that work together. The pod definition specifies which containers to include, their high-level role in the pod lifecycle, and optionally resource access and utilization.

From Kubernetes' perspective, there are two main classifications of containers in a pod:

Init Container(s): zero or more, these containers are run first, in sequential order, when the pod starts up. Each must complete successfully before the next can run.
Containers (a.k.a., app containers, regular containers): one or more, these containers perform the intended task of the pod. They start after all the init containers have completed and can run concurrently in perpetuity.

Requests and Limits, briefly

From a resources perspective, the pod definition might specify options for resource utilization like persistent volume claims for access to storage as well as the amount of RAM or CPU that's expected to be used. Specifically for RAM and CPU, Kubernetes allows for the possibility to specify requests and/or limits per container in the pod.

Requests are much like calling ahead to make a reservation for 4 people at a restaurant. The actual number of people that show up might be 4, or fewer, or more, and the restaurant will try to accommodate them. So a container's request for CPU or RAM is kind of a best guess as to what it might need, but it could end up using less or more at any given moment. Think of a SAS Compute Server that backs a user session in SAS Studio. Until the user submits program code, it's likely idle and not consuming anywhere near the requested resources. And then depending on the program tasks, it could soon consume a lot more resources than the initial request.

Limits put a cap on the amount of resources the container can use. When a CPU limit is reached, then the container's access to CPU is throttled to keep it in bounds while it continues to run. RAM limits, however, are treated very differently. When a RAM limit is exceeded by the container, then the OS can invoke the OOM (out of memory) Killer process. The OOM Killer will terminate the container which leads to the associated pod getting deleted from Kubernetes. Depending on factors, a new pod might be started to replace it. For a SAS Studio user, this means the unexpected termination of their program execution and loss of that session.

Most pod definitions for SAS Viya will specify some request and/or limit for at least one container.

Illustrating a CAS pod

Let’s take a quick look at the executable components of a SAS Cloud Analytic Services controller pod. Like other pods, the sas-cas-server-default-controller pod includes both init containers and regular containers to perform its function.

Init containers in the sas-cas-server-default-controller pod:

sas-certframe: sets up encrypted communications for this pod with the rest of SAS Viya
sas-config-init: prepares the environment for the main container

Regular containers in the sas-cas-server-default-controller pod:

sas-cas-server: this is the main container and why this pod exists - it provides the in-memory analytics engine that makes this piece of MPP CAS work
sas-backup-agent: this is a sidecar container to help with regularly scheduled backups.
sas-consul-agent: this is another sidecar container allowing CAS to work with the SAS Configuration Server

Notice that each container shown here is defined with requests and limits for both CPU and RAM. For pods in general, that's not required (some pods might have no requests or limits defined at all), but for Viya, this is very helpful to ensure that the sas-cas-server-default-controller pod can be managed intelligently with the available environmental resources. The values shown above are the defaults, but can be modified to suit the needs of your environment.

We've dug in deep with the CAS controller pod here, but note that CAS worker pods follow the same approach.

Kubernetes looks at request values

Kubernetes evaluates numerous factors to determine where to run a pod in the cluster. For this post, because we're ultimately working to figure out how many runtime pods can exist on a node, we'll focus on the request values for CPU and RAM that might be defined for the containers in a pod.

Kubernetes refers to the composite of request values for CPU or RAM (if they are defined) for the pod. Because init containers run sequentially to completion, Kubernetes only considers the highest request value for CPU or RAM across all init containers. For the regular containers, which can run concurrently for an arbitrary amount of time, it looks at the combined values requested for CPU or RAM. The highest composite request value across init and regular containers then is used as part of the determination for where to place the pod. This helps ensure that each pod has sufficient CPU and/or RAM resources.

Now let's take a closer look at the CAS controller pod to evaluate its containers' requests for CPU and RAM.

For the init containers in the CAS controller pod:

$ kubectl get pod sas-cas-server-default-controller -o yaml | yq eval '.spec.initContainers[] | .resources.requests'

cpu: 500m
memory: 500Mi
cpu: 500m
memory: 500Mi

The highest request for CPU of the init containers is 500 millicores.
The highest request for RAM of the init containers is 500 mebibytes.

And for the regular containers in the CAS controller pod:

$ kubectl get pod sas-cas-server-default-controller -o yaml | yq eval '.spec.containers[] | .resources.requests'

cpu: "5"
memory: "22261674803"
cpu: 100m
memory: 50Mi
cpu: 100m
memory: 50Mi

The combined request for CPU across all regular containers is 5.2 cores (=5 + 0.1 + 0.1).
The combined request for RAM across all regular containers is 20.8 gibibytes (=20.7 + 0.05 + 0.05).

So, when looking at the composite of all requests for resources, it will seem as if Kubernetes is treating the sas-cas-default-controller pod as requesting 5.2 cores of CPU and 20.8 gibibytes of RAM. It will only schedule the CAS Controller to a node with sufficient available resources (and in line with its various other considerations, like labels and selectors, taint and tolerations, etc).

Remember, these request values are just a starting point. The containers involved could actually use less or more of CPU and RAM (up to a defined limit, if provided) during their operation.

CAS default

You might be wondering why so much RAM and CPU for one of those containers in the CAS controller's pod? And that memory request is a very precise value, too - not a simple round number like "20 GB". That container is the actual CAS controller, of course. And it makes those resource requests because Viya defaults to an approach referred to as "auto-resourcing" such that it requests a majority of a node's CPU core count (half + 1) and RAM (~65%) for each CAS instance (controller or worker).

In this example, the host for the CAS controller is an AWS EC2 m5.2xlarge with 8 CPU and 32 GiB of RAM. The idea is to ensure that each CAS instance (controller or worker) will request enough resources that Kubernetes won't try to place two of them together on a single node. CAS was designed with performance as its primary objective, so we expect to place each CAS pod on a dedicated node of the Kubernetes cluster. That is, the typical Viya deployment is configured to request a majority of CPU and RAM resources for the CAS pod so that just one will run per node for either SMP or MPP deployment of the CAS server (this concept does not apply to the personal CAS server).

You're not stuck with the default auto-resourcing approach. It is possible to specify your own requests and limits for CAS, if desired. The SAS documentation recommends setting the request equal to the limits for both CPU and RAM values so that Kubernetes will apply the Guaranteed quality of service to the CAS pod, helping to give it priority over other pods and preventing it from being evicted from the node in over-pressure situations. For more information about this, see:

SAS® Viya® Platform Operations
Common Customizations > Adjust RAM and CPU Resources for CAS Servers

Coming up

In the next post of this series, we'll take a closer look at SAS Workload Management and its impact on the SAS Compute Server pods. In particular, we'll dive into the environmental aspects that affect the operation of SAS Workload Orchestrator as the scheduler for SAS Compute pods.

touwen_k · ‎12-04-2023

This is a great article and written in such away that one can really follow how Kubernetes plans the workload. I am not an expert about Kubernetes, but heard in my environment that sometimes defining no limits on CPU can be useful. Of course there will be as always pros and cons, pls see the link For the Love of God, Stop Using CPU Limits on Kubernetes (Updated) | Robusta

RobCollum · ‎12-04-2023

Hi Karolina!

In general, defining a pod spec so it makes a request for CPU without an associated limit is a fine idea. Doing so will allow Kubernetes to employ the Burstable Quality of Service approach to the pod.

The Burstable QoS provides a very flexible approach to ensure the pod has full access to node resources while also giving Kubernetes some idea of how "heavy" (at a minimum) the pod will be (as the request). As you'll recall, this was mentioned in my earlier post, Where SAS Viya Relies on Kubernetes for Workload Placement.

Pods operating under Guaranteed QoS are the last to be evicted from a node that's under pressure. So, if your aim is to define a pod such that Kubernetes will apply its Guaranteed QoS to the pod's lifecycle, then defining a limit that matches the associated request for CPU as well as RAM is required. For critical infrastructure services that SAS Viya employs – like CAS, Postgres, Redis, etc. – then this might be a useful configuration to implement.

You raise a valid question. It's very important to understand the various configuration possibilities and how they impact the environment. For Kubernetes and SAS Viya, a one-size-fits-all approach isn't something we should try for. Instead, we need to ascertain what's going on and adjust accordingly. I hope this post and those that follow in this series will help highlight some areas to watch for.

SAS Communities Library