Considerations for optimizing SAS compute sessions in SAS Viya on Kubernetes

9 Likes

Introduction

In this blog I’d like to talk about some approaches worth taking a close look at when you’re set out to tune the performance of your SAS compute sessions in SAS Viya on Kubernetes. These sessions are still using a SAS 9.x runtime, so you can expect a high backward compatibility when you’re planning to migrate your existing SAS codes. However, they’re now executing in Kubernetes pods and that means a fundamental change in the way how they are provisioned and configured.

Understanding how Kubernetes resource management works might not have been on the list of your favorite To-Dos for the rest of 2022 (so far at least), but it’s definitely an important topic. Keep in mind that a successful deployment of SAS Viya will get you a running but certainly not an optimized environment, so most of the configuration tasks discussed in this blog will be quite obligatory. The following list shows the major areas which determine if your compute sessions will run with adequate performance:

I/O – namely the I/O of your SASWORK drive
CPU and memory
SAS system options
Pod startup latency

Make sure to turn the right knobs

… and there are quite a few of these “knobs”. Performance tuning is not a task which can be done in one spot only. Rather there are at least 3 layers to be considered: the (cloud) infrastructure, the Kubernetes cluster and the SAS configuration. Combine this with the areas I mentioned above and you get this matrix:

“Where to do what”	Disk I/O (SASWORK)	CPU and memory	SAS system options	Pod startup latency
(Cloud) Infrastructure	X
Kubernetes cluster		X
SAS configuration			X	X

Disk I/O (SASWORK)

SAS compute workloads are known to be I/O intensive and the SASWORK location is especially important in this regard as SAS programs usually use it quite heavily. There are a couple of options available for SAS Viya on Kubernetes (depending on whether you run in the cloud or not) and instead of covering this here, I’d like to refer you to another blog which is dedicated to this topic.

CPU and memory assignments

Kubernetes is often referred to as a “container orchestration software” and resource management capabilities are not surprisingly at the very core of its’ capabilities. The most obvious resource types to manage are CPU and memory utilization. Kubernetes pods can declare their requirements in their deployment manifests. The Kubernetes service is responsible for making sure that pods are scheduled to nodes which meet the requested minimum and that they to not exceed their declared limits (e.g. the maximum declared amount of memory).

SAS compute sessions start with rather low default settings for CPU and memory. These settings can and usually need to be modified. Compute sessions are based on PodTemplates and the SAS Launcher service is responsible for dynamically creating the session pods on request (e.g. when you log in to SAS Studio or when a batch job is submitted). The Launcher service will also add the resource usage definitions to the session pods it starts. From an administrator’s point of view, you set these definitions as annotations in the metadata section of the corresponding PodTemplate(s). For this blog we’re focusing on the “sas-compute-job-config” PodTemplate as it is the one used by the applications we’re interested in (SAS Job Execution, but also SAS Studio and SAS Model Studio).

The full documentation for these annotations can be found here in the SAS Administration Guide and there is also a README in the deployment assets at $deploy/sas-bases/examples/sas-launcher/configure/README.md. Here are the most important ones which should be set:

Resource Type	Annotation	Purpose
CPU	launcher.sas.com/default-cpu-request	guaranteed (min) CPU value
CPU	launcher.sas.com/default-cpu-limit	max CPU limit (max threshold)
Memory	launcher.sas.com/default-memory-request	guaranteed (min) memory
Memory	launcher.sas.com/default-memory-limit	max memory limit (max threshold)

(all using the usual notation for Resource units in Kubernetes).

These annotations should be set when deploying the environment (see the link above), but it’s also possible to patch the PodTemplate afterwards (which comes in handy for testing things). Here are some sample commands:

# set the max CPU limit for SAS Studio sessions
$ kubectl -n viya4 annotate PodTemplate sas-compute-job-config \
  --overwrite launcher.sas.com/max-cpu-limit=4
 
# delete the max CPU limit setting (so the default applies again)
$ kubectl -n viya4 annotate PodTemplate sas-compute-job-config \
  --overwrite launcher.sas.com/max-cpu-request-

# display the current settings (if available)
$ kubectl -n viya4 describe PodTemplate sas-compute-job-config | \
  grep " launcher.sas.com" | grep "cpu\|memory"

Annotations:  launcher.sas.com/default-cpu-limit: 4
              launcher.sas.com/default-cpu-request: 1
              launcher.sas.com/default-memory-limit: 32Gi
              launcher.sas.com/default-memory-request: 2Gi
              launcher.sas.com/max-cpu-limit: 4
              launcher.sas.com/max-cpu-request: 1
              launcher.sas.com/max-memory-limit: 32Gi
              launcher.sas.com/max-memory-request: 2Gi

(make sure that you restart your SAS sessions (e.g. SAS Studio -> “reset SAS session”) after you made changes to the PodTemplate to let them come into effect).

That’s easy enough, isn’t it? Yes, but there are a few considerations to be aware of. Let’s look at them before continuing.

CPU resource assignment

Setting the default-cpu-request and default-cpu-limit annotations usually defines a min-max range, e.g. say 1-4 CPUs. During execution time the SAS session will request the CPU time it needs and can handle.

What does that mean? Well, not all SAS PROCs are multithreaded, so some of them might stick to a single CPU regardless of what you allow them to use. Take a look at this screenshot showing the resource usage of 2 compute session pods running in parallel:

When this screenshot was taken, one of the sessions was busy executing a DATA step while the other one did a PROC SORT. Guess which is which? Right – the session which used 961m CPU (~ 1 CPU) is the one running the DATA step, while the PROC SORT utilizes ~ 1.5 CPUs because this PROC supports multithreading. This is not something you can influence, but it’s still good to know when trying to understand performance metrics.

Kubernetes uses the default-cpu-request setting to decide on which node the new session pod can be scheduled. As the pod is guaranteed to be able to use at least this amount of CPU, the node needs to have at least the requested free capacity. Which means that in the worst case a high default-cpu-request value might prevent sessions from starting at high-traffic times. On the other hand: being (overly) conservative by setting a too low value for default-cpu-limit will lead to CPU throttling which negatively impacts the compute performance.

Memory resource assignment

default-memory-request and default-memory-limit, the memory related resource management settings, can be used just like the CPU settings. One notable difference is that setting the memory limit too low is risky, as pods trying to use more memory than declared will get evicted (in plain words: they are “OOMkilled”, i.e. killed because they’ve run “out-of-memory”). Which is a much more unfriendly behavior than just throttling CPU usage …

Finding the right range for CPU and memory usage can be tricky and it’s a good idea to use observability tools such as Grafana to monitor the cluster to better understand the resource usage. These tools also provide a good overview of the concurrency of your compute workload, which is important to detect negative effects that emerge from overcommitting the Kubernetes worker node – for example when multiple sessions max out their limits at the same time.

SAS System options

Compute sessions in SAS Viya on Kubernetes are still based on SAS 9.4 and so it seems to be logical that the same set of system options which helped tuning the performance on previous SAS releases is still valid. Typical candidates with high impact on the compute performance are:

MEMSIZE
SORTSIZE
BUFSIZE

(see the SAS documentation for an explanation of each option or just google them) These configuration options can be set in the “Contexts” plugin of Environment Manager. The SAS Viya Administration Guide briefly summarizes the concept of Compute Contexts like that:

A SAS Compute Server is run under a compute context. (Contexts are analogous to SAS 9® SAS Application Servers.) A compute context is a specification that contains the information that is needed to run a compute server.

The information that is contained in a compute context is the user identity and any SAS options or autoexec file parameters to be used when starting the server.

As you can see from the screenshot below, different applications in SAS Viya use different contexts:

For the discussion in this blog, these contexts are the most relevant ones:

SAS Studio compute context – used when launching a SAS session from SAS Studio.
Data Mining compute context – for SAS sessions launched from SAS Model Studio pipelines. Reducing the startup latency is an important consideration for this context.
SAS Job Execution compute context – for SAS sessions launched as SAS Jobs from the Job Execution Framework (e.g. from the /SASJobExecution UI). Reducing the startup latency is important for this one as well.

Changes made to the configuration of a context are picked up immediately for any new session launched after the change was saved.

(Environment Manager -> Contexts -> Compute contexts -> (pick one context) -> Advanced -> SAS options)

The SAS system options should be aligned to the Kubernetes resource management settings to avoid unpleasant surprises. Let’s briefly discuss the MEMSIZE option, probably the most prominent “usual suspect” when it comes to performance tuning. At first look it might be tempting to simply rely on the Kubernetes settings and just let SAS use all the memory which is available to the pod:

-MEMSIZE MAX

However, this will most certainly lead to errors during program execution with SAS complaining that there was insufficient memory available. It’s important to keep in mind that pods are not virtual machines and setting resource limits will not “magically” change the pod’s view of the hardware environment it runs on. Instead, Kubernetes will pass the container’s resource settings to the underlying container runtime which “translates” this information into Linux kernel cgroups. This however might not be transparent to the application running in the container.

For example, this screenshot shows the output of the Linux top command from a shell inside a SAS session pod which has a max-memory-limit of 8GB but runs on a worker node with 256 GB of memory.

Not surprisingly this also affects SAS when MEMSIZE is set to MAX. Run PROC OPTIONS to validate that the memory of the worker node is used in this case:

80   proc options option=memsize define value lognumberformat;
81   run;
    SAS (r) Proprietary Software Release V.04.00  TS1M0
Option Value Information For SAS Option MEMSIZE
    Value: 253,556,663,040
    Scope: SAS Session

While the top command simply reported the node’s amount of memory, a different command reveals the actual memory limit of the pod (8 GB in that case) and this is a more suitable value to be used for MEMSIZE:

$ CPOD=$(kubectl -n viya4 get pods -l \
  "launcher.sas.com/requested-by-client=sas.studio" -o name)

# display the cgroups setting for limiting the pod’s memory
$ kubectl -n viya4 exec -it $CPOD -c sas-programming-environment \
  -- cat /sys/fs/cgroup/memory/memory.limit_in_bytes

8589934592

To summarize the above: it’s recommended to set the MEMSIZE system option as it is critical for SAS performance. However, avoid setting it to MAX - instead match it to what is declared in the max-memory-limit annotation for the PodTemplate.

Pod startup latency

This section is concerned with reducing the startup latency of SAS compute sessions (pods). “Startup latency” refers to the combined time it takes for a) Kubernetes to schedule a new pod to a node and b) for this pod to reach the “Running” state.

Unfortunately these steps take a few moments – depending on the cluster infrastructure it could be just a few seconds to something close to a minute. The sas-prepull service, which is active by default after deployment, tries to minimize this delay by pre-pulling the SAS runtime container image (“sas-programming-environment”) to all compute nodes. In addition to that, SAS Viya introduced the concept of “reusable servers” a few releases ago. “Reusable servers” are compute servers which are not shut down after the session terminates (which is the default behaviour). Instead, they stay around for some time and can be reused by a later session, thus saving the startup time. This feature can even be enhanced by configuring a minimum number of servers to be available at any time. If you feel reminded to what previous SAS releases called a “Pooled Workspace server”, you are on the right track.

As you can imagine, this feature is especially interesting for the SAS Job Execution Framework (i.e. the successor of the Stored Process technology was is available in former SAS releases) and also for the Data Mining Compute context (heavily used by pipelines created in SAS Model Studio).

“Reusable compute servers” require a shared account to run them. Here’s an example of how a shared account can be configured using the SAS Viya CLI:

$ sas-viya compute credentials create \
  --user viyademo01 --password password1 \
  --description "Shared account for reusable compute server"

Once the shared account is available, re-configuring a compute context is rather trivial. Add these options to the right Compute Context definition in Environment Manager:

reuseServerProcesses = true
runServerAs = <the shared account>
serverMinAvailable = <1 – x>

The last option defines the pool size and should be adjusted to the expected workload (i.e. the level of concurrency). Here’s how the final configuration should look like:

You should see the new compute session(s) starting right after the configuration has been updated.

In conclusion, here’s a short test I did to see the effects of turning on the reusable servers. I simply stopped the round-trip time it takes when calling a SAS Job using a HTTP request (i.e. what a user would do in a web browser):

time \
  curl "https://viya.host.com/SASJobExecution/?_program=/Public/ScoringTest&_action=execute" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $CLIENT_ACCESS_TOKEN" \
    -s -o /dev/null -w "%{http_code}"

Running this command a few times with and without the “reusable servers” configuration returned these results:

# / real time (curl)	Default (new session pod)	Compute server reused
1	0m22.528s	0m6.153s
2	0m19.968s	0m5.919s
3	0m18.019s	0m5.965s

As you can see, the performance gain has been quite remarkable.

Conclusion

In this blog I shared some ideas for tuning the performance of SAS compute sessions in SAS Viya on Kubernetes: how to configure CPU and memory resources and how to minimize (or even eliminate) the startup latency of SAS compute sessions. I apologize for bothering you with Kubernetes details you probably never had wished to know, but the key message has hopefully become clear: you need to take action on this (don’t rely on the defaults) and you have to be aware that some of the “knobs” you’re looking for are found on the Kubernetes level, not within SAS.

I hope this text was useful for you and let me know if you have any feedback or questions.

Helpful resources

SAS® Viya® Administration: SAS Viya Server Contexts: Overview

https://go.documentation.sas.com/doc/en/sasadmincdc/v_034/calcontexts/n01003viyaprgmsrvs00000admin.h...

SAS® Viya® Operations: Programming Run-Time Servers

https://go.documentation.sas.com/doc/en/itopscdc/v_034/itopssrv/p0wvl5nf1lvyzfn16pqdgf9tybuo.htm

Kubernetes documentation: Resource Management for Pods and Containers

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

Where to configure the SAS Programming Run-time with broader or narrower scope

https://communities.sas.com/t5/SAS-Communities-Library/Where-to-configure-the-SAS-Programming-Run-ti...

Where SAS Viya Relies on Kubernetes for Workload Placement

https://communities.sas.com/t5/SAS-Communities-Library/Where-SAS-Viya-Relies-on-Kubernetes-for-Workl...

Some SASWORK storage options for SAS Viya on Kubernetes

https://communities.sas.com/t5/SAS-Communities-Library/Some-SASWORK-storage-options-for-SAS-Viya-on-...

FredrikHansson · ‎11-25-2022

Thank you @HansEdert.

This will be most valuable to us when tuning our new Viya environement(s).

touwen_k · ‎07-05-2023

hello Hans,

thank you very much for this blog and great explanation of pod latency. I was very keen on the test you did between reusable servers and a normal compute session. I have made my own job definition as I do not have this program Scoring Test, however I 've got error 401. Otherwise I will look at the run time in the job execution, it should also give an idea I hope.

This is my program

time \
curl "https://viya..nl/SASJobExecution/?_program=/Public/SimpleDataStep_df1&_action=execute" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $CLIENT_ACCESS_TOKEN" \
-s -o /dev/null -w "%{http_code}"

HansEdert · ‎07-05-2023

Hi touwen_k, using curl definitely works for testing and the URL looks ok (assuming that you omitted the hostname in your post). The 401 response indicates an issue with the authentication actually (which is kind of good news as it means you are talking to a live endpoint at least :-)). Maybe your token has expired? Just in case it helps: I found that using postman is a great way to build these commands (it can generate the curl commandline). Much simpler for testing and debugging.

touwen_k · ‎07-19-2023

hello Hans, thank you for your answer. I have not created a Bearer token yet. One clarification question when I am on kubernetes where the sas viya is deployed, after logging in with my profile I am able to run CLI or sas viya python tools. However, if I want to run a curl request, how can I authenticate to viya, do I have to create a Bearer token or with my user account? I would like to be able to run this curl command like you did. regards Karolina T

HansEdert · ‎07-20-2023

Hello Karolina,

indeed, obtaining the access token is a mandatory first step for using the REST API through curl. Usually you would use a registered client to generate the token for you and then submit that token with the subsequent curl requests. For an overview, take a look at Joe's post: https://blogs.sas.com/content/sgf/2023/02/07/authentication-to-sas-viya/ . He also maintains a Git repository which has a lot of helpful examples: https://github.com/sassoftware/devsascom-rest-api-samples/blob/master/CoreServices/sasLogon.md

HTH, Hans

touwen_k · ‎08-21-2023

hello Hans, thank you very much for your helpful links, I have also used blog SAS Viya Authenticating as a Custom Application. I was able to test the time difference with your code for Job Execution. It is great to experience on a practical example of how reusable servers work.

AbhilashPA · ‎03-21-2024

Very good blog @HansEdert Will use it as a reference.

SAS Communities Library