CAS Resources Management ! options and recent changes – part 2

2 Likes

In the first part of this post, after the required reminders on how the pods resources are defined in Kubernetes and what are the pods QoS classes, we reviewed the available options for the CAS resources settings : auto-resourcing, customized values or initial values.

CAS auto-resourcing is the recommended choice and is enabled if you apply the default kustomization.yaml file provided in the official documentation.

With CAS auto-resourcing, we rely on the CAS Deployment Operator to identify the CAS nodes (based on their label), collect "allocatable" resources on the host and determine the appropriate CAS pods resource request and limit values.

Before the stable 2023.02 version, with CAS auto-resourcing enabled, the resource requests and limits were set to the same value (for both CPU and memory) which was approximatively 85% of the CPU and memory capacity of the node. As a consequence the CAS pods were running with the "Guaranteed QoS"

But it is different now...

CAS Auto-resourcing changes

What has changed ?

While the stable 2023.02 What’s new note “Auto-Resourcing Enhancements” is the first place to find some information about this change, here are some additional explanations on what has changed :

When auto-resourcing is not being used; Nothing has changed

With auto-resourcing: We will now set smaller request for cpu and memory, and a larger limit for cpu and memory for the side cars. For the CAS container, we set the request to roughly 75% of the CPU rounded down to the next whole number and with the cpu limit to be 100%. Doing this allows the side cars to boost up in performance when needed (For example on Sunday when the backup agent kicks off) while still allowing the CAS pod to use all the resources when the side cars are not utilizing them. With this change, the QOS of CAS with auto-resourcing is changed from Guaranteed -> Burstable. However, this is ok, in the case of auto-resourcing since CAS will be the only pod running on the node; and so would have been the one OOM killed anyways if an OOM event occurs.”

So when the CAS auto-resourcing is NOT in set in the kustomization.yaml, then things remain the same : Installation engineers can customize the CAS pods resources requests and limits or just leave the initial manifest values.

But if the CAS auto-resourcing is enabled then the behavior is slightly different :

First, the CPU request value is expressed in number of cores and set to roughly 75% of the available CPU (rounded down to the next whole number) on the node and the limit value is set to 100%.
Then the memory request value is also set to roughly 75% of the available physical memory (rounded down to the next whole number) on the node and the limit is set to 100%.
The last change, which is an automatic consequence of the changes above, is that the CAS pods move from the "Guaranteed" to the "Burstable" QoS class (since they now have different values for requests and limits).

Let’s see an example where the CAS nodes are machines with 8 CPU and 64GB of RAM.

Why did we change it ?

The goal of the resource specification changes in the containers is mostly to accommodate the "cas-backup" agent activities. The change enables supplementary container resources to increase when auto-resourcing is configured, and when a burst of activity is required.

For example, the SAS Viya platform is backed up by default every Sunday morning, when the CAS server typically is not running. With this change, the backup agent and other component containers can now temporarily use a larger share of CPU resources (up to one full core by default) that are available on the node. Backups complete more rapidly as a result.

This tradeoff means the CAS pods do not benefit any longer from the Kubernetes "Guaranteed QoS" anymore (different values for resource and limits).

However if the CAS nodes are tainted (as strongly recommended when using CAS auto-resourcing) it would not be an issue since the CAS pod would be the only pod running on the node. Remember that the "Guaranteed QoS" provide higher protection (against evictions) of the pod when it is in concurrence with other pods running on the same node.

Finally, another change to note is that when not using the CAS auto-resourcing configuration, we now explicitly recommend in the official documentation to set the resources specification with the "Guaranteed QoS". It prevents the CAS pods to be evicted when running in concurrence with other pods on the same node.

Extra considerations

SAS CAS Control pod tuning

The tuning documentation mentions specific resource tuning that could be implemented for the sas-cas-control pod to allow the Viya platform to support an increased number of CAS sessions and avoid running into OOM (Out of Memory issues).

Here we are NOT talking about the CAS pods that corresponds to CAS SMP or MPP "nodes" and whose resources are managed by the CAS Deployment operator.

The sas-cas-control pod is a stateless service whose purpose is to assist CAS.

Here are its default resources requests and limits :

The sas-cas-control pod has two main roles:

1) A bridge for all applications (Visual Analytics, Model Studio, etc…) to access to the CAS servers in the Viya deployment

2) For the Consul pod to manage the CAS using the SAS Environment application

What has been observed during various benchmarks is that the sas-control-pod could become a bottleneck under heavy load of the CAS server, when a lot of CAS sessions are started.

We can see from the screenshot that the default CPU and memory limits of the sas-cas-control pod are respectively set to 500 millicores (half a core) and 2500Mi (around 2.5 GB).

So, if a lot of CAS sessions are expected, it would be a good idea to monitor these limits and in case the actual usage come close to the limit, increase them accordingly for higher user concurrencies. Otherwise CAS sessions could be closed unexpectedly.

Monitor your Pods resources with k9s

If (just like me 😊), you like to use k9s to interact with your kubernetes clusters, there is something very useful to monitor the resource utilization.

In the "pods view", several columns provide information on the real time resource consumption and how close the CPU and memory utilization approach their defined limits.

Here are the fields displayed by default :

In the screenshots below, thanks to these fields, you can easily see how the CAS resources utilization evolves, in different situations.

Idle :

Opening report in VA : The CAS Controller is using 1% of the CPU limit, 2% of the memory limit.

Now, when we start to work on Analytics pipelines in SAS Model Studio and let SAS explore a large dataset and automatically generate the pipeline from it, we can start to see a significant increase of the CAS Controller using CPU up to more than 70% of the resource limit that was set for the pod !

When you are looking at all the pods in the same time, you can also sort by the columns, for example, type <Ctrl+x> to sort by %CPU/L or <ctrl-q> to sort by %MEM/L.

So k9s is very handy to see, in real time, how the real consumption is approaching the limits.

However, keep in mind that in production environments the good practice is to implement alerts (for example with Prometheus, as explained by Ajmal Farzam in this article) in order to detect when pods resource consumption is getting close to the limits or when nodes resources are under pressure.

Sometimes it is not that simple.

A few weeks ago, this question was asked to our team :

The problem here, is that a CAS worker pod could not be scheduled on the intended CAS node because there are already pods running there and "reserving" a portion of the node resources.

But how can it happen if we have tainted our CAS nodes to only allow pods with the “CAS” toleration to go there ???

So, in Kubernetes there is a special type of workload resource called “Daemonsets”.

"A DaemonSet ensures that all (or some) Nodes run a copy of a Pod. As nodes are added to the cluster, Pods are added to them."

It is very handy for things like a cluster storage daemon that must run on every node or a monitoring agent that should collect metrics on each node, etc…

In order to allow these pods to really run everywhere (including on tainted nodes), the DaemonSet controller sets the "node.kubernetes.io/unschedulable:NoSchedule" toleration automatically. Kubernetes can then run DaemonSet Pods on nodes that are marked as unschedulable. It answers our question above and explain why these pods can even run on our "CAS" tainted node...

So if the "DaemonSets managed" pods were already scheduled on the Node and booking significant CPU/ram resources, then when Kubernetes tries to schedule our CAS pods with very high resource requests (because of the auto-resourcing configuration), it is likely that they could not be satisfied and the CAS pod remains in "PENDING" state.

In such case, you might have to disable the CAS auto-resource and instead, customize the CAS pods resources CPU/memory request manually to the values that would work in your situation.

However, note that with the recent changes in stable 2023.03, where the CPU and memory request are now set around to 75% of the node’s capacity (instead of 85% previously) this problem should be mitigated to some degree.

Conclusion

CAS (Cloud Analytic Services) is a key component of the Viya platform.

It’s the SAS Viya platform’s primary Analytics engine used by various applications such as "SAS Visual Analytics", or "SAS Model Studio" allowing many users to run all kind of advanced analytics pipelines using Machine learning, Gradient Boosting or Neuronal networks model, or to simply navigate or create new dashboards or reports.

The design of CAS puts performance first. This requires reserving as much CPU and memory for CAS as possible so it can process the data quickly, while also remaining a highly available and resilient service.

That’s why it is so important to understand how the CAS system can be configured, monitored (and scaled up if needed) so it can make the most of available resources on the nodes while remaining "protected" against a potential eviction or Out of memory incidents.

Hopefully this small series of posts would have provided some guidance to reach this goal !

As usually any comments, questions are very welcome 🙂

References :

Rob Collum's post : Where SAS Viya Relies on Kubernetes for Workload Placement
Quality of service in Kubernetes : https://medium.com/google-cloud/quality-of-service-class-qos-in-kubernetes-bb76a89eb2c6

ronan · ‎07-06-2023

Thanks for sharing, @RPoumarede , this is very valuable information. K9s extra tip also much useful !