In this post I will explore the use of Kubernetes Guaranteed Quality of Service (QoS) with SAS Viya. We will look at the benefits of Guaranteed QoS, how do you implement it and some possible scenarios for using it with SAS Viya. I will also show configuration examples based on the proposed scenarios.
Kubernetes uses the QoS classification (QoS class) to influence how different pods are handled. As the name suggests, what “quality of service” to give to a pod. Should a pod receive preferential treatment compared to other pods running in the cluster.
The QoS is most helpful and used when a Kubernetes node is under resource pressure.
Let’s start by understanding the basics of Kubernetes Quality of Service.
The Kubernetes documentation states that there are three classes of Quality of Service for pods. See Configure Quality of Service for Pods.
The QoS classes are:
Kubernetes uses the QoS classification (QoS classes) to influence how different pods are handled. The Kubernetes QoS classification is based on the resource requests of the containers in that Pod, along with how those requests relate to resource limits.
The QoS classes are used by Kubernetes to decide which Pods to evict from a node experiencing Node Pressure. Node-pressure eviction is the process by which the kubelet proactively terminates pods to reclaim resources on a node.
The kubelet monitors resources such as memory, disk space, and filesystem inodes on your Kubernetes cluster's hosts. When one or more of these resources reach specific consumption levels, the kubelet can proactively fail one or more pods on the host to reclaim resources and prevent starvation.
But what does this mean for the three QoS classes?
As the name might suggest this provides the lowest level of service. Pods in the BestEffort QoS class can use node resources that aren't specifically assigned to Pods in other QoS classes. The kubelet prefers to evict BestEffort Pods if the node comes under resource pressure.
Pods that are Burstable have some lower-bound resource guarantees based on the pod requests, but do not require a specific limit. If a limit is not specified, it defaults to a limit equivalent to the capacity of the Node, which allows the Pods flexibly to increase their resources if resources are available.
This is the default configuration for SAS Viya.
In the event of Pod eviction due to node resource pressure, these Pods are evicted only after all BestEffort Pods are evicted.
Pods that are Guaranteed have the strictest resource limits and are least likely to face eviction. They are guaranteed not to be killed until they exceed their limits or there are no lower-priority Pods that can be preempted from the node. They may not acquire resources beyond their specified limits. These Pods can also make use of exclusive CPUs using the static CPU management policy.
Hence, this is a good option to protect critical services.
The downside of specifying Guaranteed QoS is that pods can end up in a pending state if no nodes are available with sufficient capacity to run the pod.
Before I discuss the SAS Viya considerations in more detail, let’s consider the options to isolate or dedicate resources to a component.
Using a dedicated node pool is possibly the obvious choice. Node pools are commonly used to meet the requirements specific to a component (set of pods). For example, the need for GPUs, additional memory and/or storage requirements.
But a node pool could also be used to dedicate resources to a component. Though this might not be the most cost-effective approach. Higher levels of resource sharing helps to optimize the infrastructure (Kubernetes cluster) costs.
This is where using Guaranteed QoS has a role. Rather than creating dedicated node pools, use a node pool for a variety of components. For example, the SAS Viya stateful and stateless services sharing a single node pool, but protect the critical services using Guaranteed QoS.
As previously stated, the default configuration for SAS Viya is to use Burstable QoS. This is due to the default assets provided by SAS to create the site.yaml manifest. The site.yaml includes resource requests for containers that have different values to the resource limits, and/or has containers that only have a resource request specified (no limits have been specified).
A logical place to start when considering the use of Guaranteed QoS is for critical services. For example, the stateful services or perhaps the SAS Micro Analytic Service (MAS) and SAS Event Stream Processing (ESP).
However, the StatefulSets are all configured for high availability by default. As a result, they run with multiple pod replicas and could support a minimal level of pod evictions.
Therefore, if you decide to create a single node pool for both stateful and stateless workloads, the risk of throttling or pod eviction during periods of heavy usage is unlikely to affect the stateful SAS service availability. But it is possible if too many pods of an individual stateful service are evicted at the same time.
This is all true, so why might you consider changing the default configuration?
Perhaps the SAS Viya platform has very high availability service levels (six or seven 9s as an availability target) and/or the platform is supporting real-time processing and low latency is extremely important.
In these scenarios the use of Guaranteed QoS could be a solution without using dedicated node pools.
Looking at the SAS Viya stateful services some likely candidates would be:
The SAS Redis Service could be another candidate, but as the default deployment has six replicas of the sas-redis-server pods, I’m not sure this is worth the effort to change even with very stringent HA requirements.
However, I think the main (or best) use case for Guaranteed QoS is for real-time integration using MAS or ESP. By default, both services are classified as stateless services and are deployed with the stateful and stateless services. They are not given any special treatment.
The use of Guaranteed QoS could be considered for other components such as the SAS programming environment, the Compute Server, but due to the variable nature of the resource consumption this would not be very efficient. As you would need to set the requests/limits to cater for the largest job. This would lead to an over reservation of CPU and memory for most jobs.
So (IMO), using a Burstable QoS does make sense for Compute.
To implement Guaranteed QoS the container requests and limits must be set to the same values. But all the containers within a pod must have their requests and limits set and equal to each other for Kubernetes to classify the pod as guaranteed. This includes the init-containers.
So, how to determine what the requests and limits should be is a good question.
For my testing I inspected the default SAS Viya configuration. By this I mean, I created my Viya configuration, ran the kustomize build, then inspected the site.yaml file. Then used the default limits that were specified. This provided a starting point, but in most cases would need to be tuned for the workload.
As previously stated, in my opinion the best candidate for Guaranteed QoS is MAS. As by default a single instance of the sas-microanalytic-score pod is deployed as a stateless service.
When implementing Guaranteed QoS it is important to understand that each instance of the MAS pod (sas-microanalytic-score) is running ALL the models that have been published. Therefore, it is critical to understand the resource requirements.
Given the higher resource requirements for the MAS pods (compared to the other services) it is more likely that a MAS pod could be evicted due to Node Pressure. As the default configuration provides a single instance of MAS, the pod eviction would affect the real-time transactions.
Therefore, a good approach would be to implement at least 2 replicas of the sas-microanalytic-score pod and implement Guaranteed QoS for additional protection.
The following provides two example patch transformers to update the MAS configuration. The first patch sets the QoS, with requests/limits of 4 CPUs and 2G of memory. As a comparison the default requests are set to cpu: 250m and memory: 750M.
---
# Set Guaranteed QoS for MAS
apiVersion: builtin
kind: PatchTransformer
metadata:
name: set-mas-resources
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources
value:
limits:
cpu: 4000m
memory: 2Gi
requests:
cpu: 4000m
memory: 2Gi
# The init container limits also have to be set to implement Guaranteed QoS
# sas-start-sequencer
- op: replace
path: /spec/template/spec/initContainers/0/resources
value:
limits:
cpu: 250m
memory: 250Mi
requests:
cpu: 250m
memory: 250Mi
# sas-certframe
- op: replace
path: /spec/template/spec/initContainers/1/resources
value:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
# sas-config-init
- op: replace
path: /spec/template/spec/initContainers/2/resources
value:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
# sas-commonfiles-init
- op: replace
path: /spec/template/spec/initContainers/3/resources
value:
limits:
cpu: 250m
memory: 250Mi
requests:
cpu: 250m
memory: 250Mi
target:
groups: apps
kind: Deployment
name: sas-microanalytic-score
version: v1
In the code above, you can see that there are four initContainers:
The second patch is to set the pod replicas. This could be done a number of ways, but given that there is a HorizontalPodAutoscaler definition for MAS I choose to create patch for that. Here I have set the pod replicas to 3.
---
apiVersion: builtin
kind: PatchTransformer
metadata:
name: enable-ha-mas-replicas
patch: |-
- op: replace
path: /spec/maxReplicas
value: 3
- op: replace
path: /spec/minReplicas
value: 3
target:
kind: HorizontalPodAutoscaler
version: v2
apps: autoscaling
name: sas-microanalytic-score
After applying the two patches, in the images below you can see that I had three ‘sas-microanalytic-score’ pods that have a Guaranteed QoS classification.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Using the kubectl client to view what nodes the pods are running on; you can see that due to the podAntiAffinity they are distributed across three nodes.
podAntiAffinity is explained in the Kubernetes documentation: Assigning Pods to Nodes
The following provides an example to update the Consul definition to set the cpu and memory requests to equal their associated limits so that Kubernetes will assign them Guaranteed QoS. Reviewing the patch, you will see that there are four initContainers that need to be updated:
---
# Set Guaranteed QoS for Consul
apiVersion: builtin
kind: PatchTransformer
metadata:
name: set-consul-resources
patch: |-
- op: replace
path: /spec/template/spec/containers/0/resources
value:
limits:
cpu: 1000m
memory: 1Gi
requests:
cpu: 1000m
memory: 1Gi
# The init container limits also have to be set to implement Guaranteed QoS
# sas-start-sequencer
- op: replace
path: /spec/template/spec/initContainers/0/resources
value:
limits:
cpu: 250m
memory: 250Mi
requests:
cpu: 250m
memory: 250Mi
# sas-certframe
- op: replace
path: /spec/template/spec/initContainers/1/resources
value:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
# sas-certframe-client-token-generator
- op: replace
path: /spec/template/spec/initContainers/2/resources
value:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
# sas-certframe-management-token-generator
- op: replace
path: /spec/template/spec/initContainers/3/resources
value:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 500m
memory: 500Mi
target:
groups: apps
kind: StatefulSet
name: sas-consul-server
version: v1
Applying this patch you get the following result.
You can now see that Consul is using Guaranteed QoS.
RabbitMQ is also defined as a StatefulSet, there are three initContainers, being:
Updating RabbitMQ you get the following.
Finally, if you did want to update Redis, it is defined using a PodTemplate, it has only one initContainers definition, being sas-certframe.
Using Guaranteed QoS can be a useful approach for protecting a service, especially when sharing a node pool with other services.
However, you do need to understand the SAS Viya deployment, as to implement Guaranteed QoS all containers within the pod need to be configured for Guaranteed QoS.
I was working with SAS Viya Stable 2024.05, the initContainers shown in this post relate to Stable 2024.05. It is important to understand that this may change over time, so you shouldn’t assume the configuration will be the same when moving from one cadence version to another.
Finally, you must also keep in mind that the cluster needs to have sufficient capacity otherwise you can end up with pods in a ‘pending’ state. This isn’t specific to using Guaranteed QoS, it will happen for any pod that can’t have its resource requests met. But it could be more likely to happen when using Guaranteed QoS.
Please note, while this is a recent update to the SAS Communities site, it was written a while ago. There have been changes to the Redis Server configuration since first creating this post.
Thanks for reading…
There are several other posts related to this topic, also see the series from @RobCollum: Determine how many SAS Viya analytics pods can run on a Kubernetes node – part 1.
Find more articles from SAS Global Enablement and Learning here.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.