BookmarkSubscribeRSS Feed

Considerations for optimizing SAS compute sessions in SAS Viya on Kubernetes

Started ‎11-25-2022 by
Modified ‎11-25-2022 by
Views 6,243

Introduction

 

In this blog I’d like to talk about some approaches worth taking a close look at when you’re set out to tune the performance of your SAS compute sessions in SAS Viya on Kubernetes. These sessions are still using a SAS 9.x runtime, so you can expect a high backward compatibility when you’re planning to migrate your existing SAS codes. However, they’re now executing in Kubernetes pods and that means a fundamental change in the way how they are provisioned and configured.

 

Understanding how Kubernetes resource management works might not have been on the list of your favorite To-Dos for the rest of 2022 (so far at least), but it’s definitely an important topic. Keep in mind that a successful deployment of SAS Viya will get you a running but certainly not an optimized environment, so most of the configuration tasks discussed in this blog will be quite obligatory. The following list shows the major areas which determine if your compute sessions will run with adequate performance:

 

  • I/O – namely the I/O of your SASWORK drive
  • CPU and memory
  • SAS system options
  • Pod startup latency

 

 

Make sure to turn the right knobs

 

… and there are quite a few of these “knobs”. Performance tuning is not a task which can be done in one spot only. Rather there are at least 3 layers to be considered: the (cloud) infrastructure, the Kubernetes cluster and the SAS configuration. Combine this with the areas I mentioned above and you get this matrix:

 

“Where to do what”

Disk I/O (SASWORK)

CPU and memory

SAS system options

Pod startup latency

(Cloud) Infrastructure

X

 

 

 

Kubernetes cluster

 

X

 

 

SAS configuration

 

 

X

X

 

 

Disk I/O (SASWORK)

 

SAS compute workloads are known to be I/O intensive and the SASWORK location is especially important in this regard as SAS programs usually use it quite heavily. There are a couple of options available for SAS Viya on Kubernetes (depending on whether you run in the cloud or not) and instead of covering this here, I’d like to refer you to another blog which is dedicated to this topic.

 

 

CPU and memory assignments

 

Kubernetes is often referred to as a “container orchestration software” and resource management capabilities are not surprisingly at the very core of its’ capabilities. The most obvious resource types to manage are CPU and memory utilization. Kubernetes pods can declare their requirements in their deployment manifests. The Kubernetes service is responsible for making sure that pods are scheduled to nodes which meet the requested minimum and that they to not exceed their declared limits (e.g. the maximum declared amount of memory).

 

SAS compute sessions start with rather low default settings for CPU and memory. These settings can and usually need to be modified. Compute sessions are based on PodTemplates and the SAS Launcher service is responsible for dynamically creating the session pods on request (e.g. when you log in to SAS Studio or when a batch job is submitted). The Launcher service will also add the resource usage definitions to the session pods it starts. From an administrator’s point of view, you set these definitions as annotations in the metadata section of the corresponding PodTemplate(s). For this blog we’re focusing on the “sas-compute-job-config” PodTemplate as it is the one used by the applications we’re interested in (SAS Job Execution, but also SAS Studio and SAS Model Studio).

 

The full documentation for these annotations can be found here in the SAS Administration Guide and there is also a README in the deployment assets at $deploy/sas-bases/examples/sas-launcher/configure/README.md. Here are the most important ones which should be set:

 

Resource Type

Annotation

 Purpose

CPU

launcher.sas.com/default-cpu-request

guaranteed (min) CPU value

CPU

launcher.sas.com/default-cpu-limit

max CPU limit (max threshold)

Memory

launcher.sas.com/default-memory-request

guaranteed (min) memory

Memory

launcher.sas.com/default-memory-limit

max memory limit (max threshold)

 

(all using the usual notation for Resource units in Kubernetes).

 

These annotations should be set when deploying the environment (see the link above), but it’s also possible to patch the PodTemplate afterwards (which comes in handy for testing things). Here are some sample commands:

 

# set the max CPU limit for SAS Studio sessions
$ kubectl -n viya4 annotate PodTemplate sas-compute-job-config \
  --overwrite launcher.sas.com/max-cpu-limit=4
 
# delete the max CPU limit setting (so the default applies again)
$ kubectl -n viya4 annotate PodTemplate sas-compute-job-config \
  --overwrite launcher.sas.com/max-cpu-request-

# display the current settings (if available)
$ kubectl -n viya4 describe PodTemplate sas-compute-job-config | \
  grep " launcher.sas.com" | grep "cpu\|memory"

Annotations:  launcher.sas.com/default-cpu-limit: 4
              launcher.sas.com/default-cpu-request: 1
              launcher.sas.com/default-memory-limit: 32Gi
              launcher.sas.com/default-memory-request: 2Gi
              launcher.sas.com/max-cpu-limit: 4
              launcher.sas.com/max-cpu-request: 1
              launcher.sas.com/max-memory-limit: 32Gi
              launcher.sas.com/max-memory-request: 2Gi

  

(make sure that you restart your SAS sessions (e.g. SAS Studio -> “reset SAS session”) after you made changes to the PodTemplate to let them come into effect).

 

That’s easy enough, isn’t it? Yes, but there are a few considerations to be aware of. Let’s look at them before continuing.

 

CPU resource assignment

 

Setting the default-cpu-request and default-cpu-limit annotations usually defines a min-max range, e.g. say 1-4 CPUs. During execution time the SAS session will request the CPU time it needs and can handle.

 

What does that mean? Well, not all SAS PROCs are multithreaded, so some of them might stick to a single CPU regardless of what you allow them to use. Take a look at this screenshot showing the resource usage of 2 compute session pods running in parallel:

 

jobs-parallel.jpg

 

When this screenshot was taken, one of the sessions was busy executing a DATA step while the other one did a PROC SORT. Guess which is which? Right – the session which used 961m CPU (~ 1 CPU) is the one running the DATA step, while the PROC SORT utilizes ~ 1.5 CPUs because this PROC supports multithreading. This is not something you can influence, but it’s still good to know when trying to understand performance metrics.

 

Kubernetes uses the default-cpu-request setting to decide on which node the new session pod can be scheduled. As the pod is guaranteed to be able to use at least this amount of CPU, the node needs to have at least the requested free capacity. Which means that in the worst case a high default-cpu-request value might prevent sessions from starting at high-traffic times. On the other hand: being (overly) conservative by setting a too low value for default-cpu-limit will lead to CPU throttling which negatively impacts the compute performance.

 

Memory resource assignment

 

default-memory-request and default-memory-limit, the memory related resource management settings, can be used just like the CPU settings. One notable difference is that setting the memory limit too low is risky, as pods trying to use more memory than declared will get evicted (in plain words: they are “OOMkilled”, i.e. killed because they’ve run “out-of-memory”). Which is a much more unfriendly behavior than just throttling CPU usage …

 

Finding the right range for CPU and memory usage can be tricky and it’s a good idea to use observability tools such as Grafana to monitor the cluster to better understand the resource usage. These tools also provide a good overview of the concurrency of your compute workload, which is important to detect negative effects that emerge from overcommitting the Kubernetes worker node – for example when multiple sessions max out their limits at the same time.

 

 

SAS System options

 

Compute sessions in SAS Viya on Kubernetes are still based on SAS 9.4 and so it seems to be logical that the same set of system options which helped tuning the performance on previous SAS releases is still valid. Typical candidates with high impact on the compute performance are:

 

  • MEMSIZE
  • SORTSIZE
  • BUFSIZE

 

(see the SAS documentation for an explanation of each option or just google them) These configuration options can be set in the “Contexts” plugin of Environment Manager. The SAS Viya Administration Guide briefly summarizes the concept of Compute Contexts like that:

 

A SAS Compute Server is run under a compute context. (Contexts are analogous to SAS 9® SAS Application Servers.) A compute context is a specification that contains the information that is needed to run a compute server.

The information that is contained in a compute context is the user identity and any SAS options or autoexec file parameters to be used when starting the server.

 

As you can see from the screenshot below, different applications in SAS Viya use different contexts:

 

ev1.jpg

 

For the discussion in this blog, these contexts are the most relevant ones:

 

  • SAS Studio compute context – used when launching a SAS session from SAS Studio.
  • Data Mining compute context – for SAS sessions launched from SAS Model Studio pipelines. Reducing the startup latency is an important consideration for this context.
  • SAS Job Execution compute context – for SAS sessions launched as SAS Jobs from the Job Execution Framework (e.g. from the /SASJobExecution UI). Reducing the startup latency is important for this one as well.

 

Changes made to the configuration of a context are picked up immediately for any new session launched after the change was saved.

 

ev2.jpg

 

(Environment Manager -> Contexts -> Compute contexts -> (pick one context) -> Advanced -> SAS options)

 

The SAS system options should be aligned to the Kubernetes resource management settings to avoid unpleasant surprises. Let’s briefly discuss the MEMSIZE option, probably the most prominent “usual suspect” when it comes to performance tuning. At first look it might be tempting to simply rely on the Kubernetes settings and just let SAS use all the memory which is available to the pod:

 

-MEMSIZE MAX

 

However, this will most certainly lead to errors during program execution with SAS complaining that there was insufficient memory available. It’s important to keep in mind that pods are not virtual machines and setting resource limits will not “magically” change the pod’s view of the hardware environment it runs on. Instead, Kubernetes will pass the container’s resource settings to the underlying container runtime which “translates” this information into Linux kernel cgroups. This however might not be transparent to the application running in the container.

 

For example, this screenshot shows the output of the Linux top command from a shell inside a SAS session pod which has a max-memory-limit of 8GB but runs on a worker node with 256 GB of memory.

 

top1.jpg

 

Not surprisingly this also affects SAS when MEMSIZE is set to MAX. Run PROC OPTIONS to validate that the memory of the worker node is used in this case:

 

80   proc options option=memsize define value lognumberformat;
81   run;
    SAS (r) Proprietary Software Release V.04.00  TS1M0
Option Value Information For SAS Option MEMSIZE
    Value: 253,556,663,040
    Scope: SAS Session

 

While the top command simply reported the node’s amount of memory, a different command reveals the actual memory limit of the pod (8 GB in that case) and this is a more suitable value to be used for MEMSIZE:

 

$ CPOD=$(kubectl -n viya4 get pods -l \
  "launcher.sas.com/requested-by-client=sas.studio" -o name)

# display the cgroups setting for limiting the pod’s memory
$ kubectl -n viya4 exec -it $CPOD -c sas-programming-environment \
  -- cat /sys/fs/cgroup/memory/memory.limit_in_bytes

8589934592

 

To summarize the above: it’s recommended to set the MEMSIZE system option as it is critical for SAS performance. However, avoid setting it to MAX - instead match it to what is declared in the max-memory-limit annotation for the PodTemplate.

 

 

Pod startup latency

 

This section is concerned with reducing the startup latency of SAS compute sessions (pods). “Startup latency” refers to the combined time it takes for a) Kubernetes to schedule a new pod to a node and b) for this pod to reach the “Running” state.

 

Unfortunately these steps take a few moments – depending on the cluster infrastructure it could be just a few seconds to something close to a minute. The sas-prepull service, which is active by default after deployment, tries to minimize this delay by pre-pulling the SAS runtime container image (“sas-programming-environment”) to all compute nodes. In addition to that, SAS Viya introduced the concept of “reusable servers” a few releases ago. “Reusable servers” are compute servers which are not shut down after the session terminates (which is the default behaviour). Instead, they stay around for some time and can be reused by a later session, thus saving the startup time. This feature can even be enhanced by configuring a minimum number of servers to be available at any time. If you feel reminded to what previous SAS releases called a “Pooled Workspace server”, you are on the right track.

 

As you can imagine, this feature is especially interesting for the SAS Job Execution Framework (i.e. the successor of the Stored Process technology was is available in former SAS releases) and also for the Data Mining Compute context (heavily used by pipelines created in SAS Model Studio).

“Reusable compute servers” require a shared account to run them. Here’s an example of how a shared account can be configured using the SAS Viya CLI:

 

$ sas-viya compute credentials create \
  --user viyademo01 --password password1 \
  --description "Shared account for reusable compute server"

 

Once the shared account is available, re-configuring a compute context is rather trivial. Add these options to the right Compute Context definition in Environment Manager:

 

  • reuseServerProcesses = true
  • runServerAs = <the shared account>
  • serverMinAvailable = <1 – x>

 

The last option defines the pool size and should be adjusted to the expected workload (i.e. the level of concurrency). Here’s how the final configuration should look like:

 

ev3.jpg

 

You should see the new compute session(s) starting right after the configuration has been updated.

 

In conclusion, here’s a short test I did to see the effects of turning on the reusable servers. I simply stopped the round-trip time it takes when calling a SAS Job using a HTTP request (i.e. what a user would do in a web browser):

 

time \
  curl "https://viya.host.com/SASJobExecution/?_program=/Public/ScoringTest&_action=execute" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer $CLIENT_ACCESS_TOKEN" \
    -s -o /dev/null -w "%{http_code}"

 

Running this command a few times with and without the “reusable servers” configuration returned these results:

 

# / real time (curl)

Default (new session pod)

Compute server reused

1

0m22.528s

0m6.153s

2

0m19.968s

0m5.919s

3

0m18.019s

0m5.965s

 

As you can see, the performance gain has been quite remarkable.

 

 

Conclusion

 

In this blog I shared some ideas for tuning the performance of SAS compute sessions in SAS Viya on Kubernetes: how to configure CPU and memory resources and how to minimize (or even eliminate) the startup latency of SAS compute sessions. I apologize for bothering you with Kubernetes details you probably never had wished to know, but the key message has hopefully become clear: you need to take action on this (don’t rely on the defaults) and you have to be aware that some of the “knobs” you’re looking for are found on the Kubernetes level, not within SAS.

 

I hope this text was useful for you and let me know if you have any feedback or questions.

 

Helpful resources

 

SAS® Viya® Administration: SAS Viya Server Contexts: Overview

https://go.documentation.sas.com/doc/en/sasadmincdc/v_034/calcontexts/n01003viyaprgmsrvs00000admin.h...

 

SAS® Viya® Operations: Programming Run-Time Servers

https://go.documentation.sas.com/doc/en/itopscdc/v_034/itopssrv/p0wvl5nf1lvyzfn16pqdgf9tybuo.htm

 

Kubernetes documentation: Resource Management for Pods and Containers

https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

 

Where to configure the SAS Programming Run-time with broader or narrower scope

https://communities.sas.com/t5/SAS-Communities-Library/Where-to-configure-the-SAS-Programming-Run-ti...

 

Where SAS Viya Relies on Kubernetes for Workload Placement

https://communities.sas.com/t5/SAS-Communities-Library/Where-SAS-Viya-Relies-on-Kubernetes-for-Workl...

 

Some SASWORK storage options for SAS Viya on Kubernetes

https://communities.sas.com/t5/SAS-Communities-Library/Some-SASWORK-storage-options-for-SAS-Viya-on-...

 

 

Comments

Thank you @HansEdert.

This will be most valuable to us when tuning our new Viya environement(s).

hello Hans,

thank you very much for this blog and  great explanation of pod latency. I was very keen on the test you did between reusable servers and a normal compute session. I have made my own job definition as I do not have this program Scoring Test, however I 've got error 401. Otherwise I will look at the run time in the job execution, it should also give an idea I hope.

 

This is  my program 

time \
curl "https://viya..nl/SASJobExecution/?_program=/Public/SimpleDataStep_df1&_action=execute" \
--header "Content-Type: application/json" \
--header "Authorization: Bearer $CLIENT_ACCESS_TOKEN" \
-s -o /dev/null -w "%{http_code}"

Hi touwen_k, using curl definitely works for testing and the URL looks ok (assuming that you omitted the hostname in your post). The 401 response indicates an issue with the authentication actually (which is kind of good news as it means you are talking to a live endpoint at least :-)). Maybe your token has expired? Just in case it helps: I found that using postman is a great way to build these commands (it can generate the curl commandline). Much simpler for testing and debugging.

hello Hans, thank you for your answer. I have not created a Bearer token yet. One clarification question when I am on kubernetes where the sas viya is deployed, after logging in with my profile I am able to run CLI or sas viya python tools. However, if I want to run a curl request, how can I authenticate to viya, do I have to create a Bearer token or with my user account? I would like to be able to run this curl command like you did. regards Karolina T

Hello Karolina,

indeed, obtaining the access token is a mandatory first step for using the REST API through curl. Usually you would use a registered client to generate the token for you and then submit that token with the subsequent curl requests. For an overview, take a look at Joe's post: https://blogs.sas.com/content/sgf/2023/02/07/authentication-to-sas-viya/ . He also maintains a Git repository which has a lot of helpful examples: https://github.com/sassoftware/devsascom-rest-api-samples/blob/master/CoreServices/sasLogon.md

HTH, Hans

hello Hans, thank you very much for your helpful links, I have also used blog SAS Viya Authenticating as a Custom Application. I was able to test the time difference with your code for Job Execution. It is great to experience on a practical example of how reusable servers work. 

Very good blog @HansEdert  Will use it as a reference.

Version history
Last update:
‎11-25-2022 04:46 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags