Improving Concurrency Performance in SAS Viya

5 Likes

Out of the box, SAS Viya is configured to ensure it can run without overwhelming the hardware of smaller deployments. One way it does this is to limit the amount of RAM and CPU that SAS compute services will request and use. And another way is to restrict the number of concurrent tasks that are attempted. It's this latter concept that we want to explore a bit further.

There are three configuration settings in particular which affect concurrent tasks in SAS Viya that we'll look at here. The scenario here is to improve the user experience in both the SAS Viya user interface as well as the running of compute jobs behinds the scenes. The article will not offer prescriptive settings specific to a given situation, but will explain the parameters and what impact changes to the configuration will make to the system.

Scenario

One area where the GEL team refines our understanding of SAS technologies is by running live workshops. Our goal is to match real-world deployments - not just making something work in the simplest possible way. This benefits the students with environments that more closely match what they'll see at customer sites as well as the GEL team to ensure we're familiar with all of the areas that matter.

We have the ability to deploy SAS Viya to the various cloud providers as well as to the internal RACE environment. To keep costs minimized, we often default to using RACE. The GEL team (and our workshops) are major consumers of RACE resources, so we're often constrained in server size (i.e. RAM and CPU) which makes for some interesting deployment topologies.

The workshops we support utilize SAS Viya resources in different ways. But for one example, let's look at SAS Model Studio defining a pipeline of competing analytics to run:

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Depending on the order of how the preliminary nodes complete, it is possible that this pipeline could attempt to run 8 nodes simultaneously. At the system level, each node will utilize a SAS Compute Server for executing the code and then request analysis from CAS. For the workshop, we typically assign 10 students per cluster of SAS Viya servers. Of course, during an exercise, each student could be running a pipeline like this alongside everyone else.

The Challenge

Students (and instructors on their behalf) were complaining that running a pipeline like this was taking too long. Not only did Viya seem sluggish and unresponsive at times, there often wasn't enough time in the class to allow the exercise to run to completion. And in some cases, the pipeline would fail.

Of course, when the instructors built the exercise, they didn't have these problems. The jobs ran faster for them and successfully to completion, of course. The obvious difference: increased workload on the system due to the number of simultaneous tasks requested by multiple concurrent users.

Create a Pool of Reusable Compute Servers

Have you logged into SAS Studio and waited for the SAS Studio compute context to start up?

Kubernetes needs a non-zero amount of time to start up the pod with the SAS Compute Server. And so users typically experience a period of waiting until the Compute Server is ready to respond to SAS Studio.

Sometimes this isn't a snappy experience. Users might feel like they're having to wait a little too long. We'd like to make it better.

But there's another area where this problem is compounded: when running a model pipeline as illustrated above. Instead of waiting for just one Compute Server, the pipeline might need to wait for several of these to get going, one after another, multiplying the time the user must wait for the pipeline to complete. Add more users and the delays multiply yet again.

We can eliminate the startup time for the Compute Servers by pre-starting some of them in advance and having them wait for a new request to come in. While idle, they don't actively consume much CPU or RAM. As new tasks come in, they're sent to an idle Compute Server (associated with that task's context) with almost zero wait time.

There's one requirement necessary to enabling a pool of reusable compute servers: a shared service account. In the current release of Viya (2021.2.1), in order for a shared service account to work, SCIM cannot be used to provide user information to the Identities microservice (since it doesn't accommodate OS-level user accounts).

Enabling a pool of reusable compute servers can be accomplished by a SAS administrator using SAS Environment Manager and navigating to Contexts > (sidebar) > select Compute Contexts (pull-down menu) > select the desired compute context > and set the following attributes:

- reuseServerProcesses: true

- runServerAs: sastest1

- serverInactiveTimeout: 900

- serverMinAvailable: 25

Your desired compute context varies based on the SAS Viya client you’re using. So for SAS Studio, it’s probably obvious you’ll want to make these changes to the SAS Studio compute context. But to also make these improvements for SAS Model Studio, you’ll want to make similar changes to the SAS Data Mining compute context… which is less obvious. 😉

The serverMinAvailable parameter is interesting - it ensures a pool of compute servers is always running to help minimize wait times. If the initial set of compute servers are busy and more are needed, then SAS Launcher can start them. As usage falls off and the system is increasingly idle, the unused compute servers will terminate automatically, leaving the minimum number up and running.

More information about setting up a shared service account credential and reusable compute servers can be found in the SAS Viya Administration documentation.

SAS Launcher limits max processes per user

With reusuable compute servers enabled as shown above, it's apparent that the sastest1 service account will own at least 25 Compute Servers running in the cluster - possibly more. But there's another limit we need to address: the maximum number of processes the SAS Launcher is allowed to start.

And, depending on your product mix, there are two different ways to effectively set the value:

SAS Workload Management > Maximum Jobs Allowed
If SAS Workload Management is a product running in your SAS Viya environment, then make this change:
- As an administrative user in SAS Environment Manager,
- Navigate to the SAS Workload Orchestrator configuration,
- Under Host Types, expand the desired section ("default" is the default),
- And change the maximum number of jobs value to a better value, like 50, or 100.
Else, for SAS Viya deployments without the SAS Workload Management add-on:
Modify the SAS_LAUNCHER_USER_PROCESS_LIMIT.

Out of the box, the SAS Launcher will be configured with a limit to the number of processes that a single userid can start. It might be as small as 10, depending on the factors driving your installation (and this number is subject to change with future software updates).

With serverMinAvailable set to 25 for the reusable compute server above, a SAS_LAUNCHER_USER_PROCESS_LIMIT of 10 will prevent the minimum number desired from getting started. So let's change it.

Quick and easy:
```
# Increase user pod limit
kubectl set env deploy sas-launcher SAS_LAUNCHER_USER_PROCESS_LIMIT=100

# Confirm
kubectl describe deploy sas-launcher | grep SAS_LAUNCHER_USER_PROCESS_LIMIT
```
And for confirmation, you should expect to see results similar to:
```
SAS_LAUNCHER_USER_PROCESS_LIMIT:                100
SAS_LAUNCHER_USER_PROCESS_LIMIT_ENABLED:        true
```
This approach takes effect immediately - but it's not permanent. If the SAS Viya site.yaml is applied again later, this change will be forgotten.

For instructions about permanently setting this value with a patchTransformer as well as additional background material, see David Stern's post on the GEL blog, Limit a user’s simultaneous compute server processes in Viya 2021.1 and later.

Enable SAS analytics tools to run more concurrent jobs

We’ve enabled the SAS infrastructure to accommodate running more tasks in parallel. Now we need to configure the SAS analytics tools, like SAS Model Studio, to take advantage of this increased processing power. This topic is discussed for Model Studio in the documentation for both Visual Data Mining and Machine Learning and Visual Forecasting.

Enabling SAS Model Studio to run a larger number of parallel flows can be accomplished by a SAS administrator using SAS Environment Manager and navigating to Configuration > (sidebar) > select Definitions (pull-down menu) > select sas.analytics.execution (list item). If one doesn't exist yet, then click the New Configuration button and find the the Maximum Concurrent Nodes parameter.

The default value is 10 (or even just 5), but you should be able to increase this significantly depending on the number of concurrent tasks and concurrent pipelines you expect to run. It probably doesn't make sense to specify a number higher than the SAS_LAUNCHER_USER_PROCESS_LIMIT defined above.

For this change to take effect, a k8s namespace admin for SAS Viya will need to delete the sas-arke and sas-analytics-services pods so that they'll restart and notice the new parameter value.

The results

After implementing these configuration changes to our workshop environment, what difference did they make to the performance of analytics operations? Pretty noticeable, actually.

By enabling a minimum pool of reusable compute servers, the total execution time of that pipeline in SAS Model Studio was cut by 50%! Looking at the pipeline, you'll see 13 nodes. Each of those is processed by its own SAS Compute Server. So basically, the time savings came from eliminating the wait for compute servers to startup on demand because we set a minimum number to always run. The jobs themselves didn't run faster - they only had to wait less.

But that's not all. After enabling the analytic flows configuration to execute more than 5 nodes at a time, the time to run the pipeline was reduced slightly more. You see, with a single pipeline running, there's the possibility of 8 nodes executing at the same time (check out the dependencies in the illustration above). By increasing the analytic flows configuration, we can get all 8 running at the same time.

Then we ramped up the concurrent testing even further by running 3 pipelines simultaneously (as if 3 users were active all at once). This is where the analytic flows configuration really made a difference. If all 3 pipelines run and they all hit the same 8 nodes running simultaneously, then the total number of analytic nodes running on compute servers is 24. That's a lot more than 5 - and so the time savings for this aspect of the test was another 42% improvement. Again, the jobs themselves didn't run faster - they only had to wait less.

Your mileage will vary! While you can expect that tweaking the concurrency parameters of your SAS Viya deployment will improve user perception of overall performance, please don't quote this blog post as the expected minimum. There are significant differences in RACE as compared to your customer's cloud provider. Test it for yourself and note the runtime improvements for your site. Please share them in a comment below, too.

Other Kubernetes considerations

SAS Viya runs in a Kubernetes cluster - often alongside other critical software projects. A k8s cluster admin is responsible to ensure that all aspects of the environment have sufficient resources to run as well as to keep some limits in place to protect costs. So it's possible that SAS Viya might request more resources than k8s, or its underlying physical infrastructure can provide.

For example, k8s defaults to allowing 110 pods per node. This might be too many, or not enough, depending on the software that's running and the job it's doing. For example, we typically recommend dedicating k8s nodes for running the CAS workers. As such, the number of pods that'll run on a CAS worker node will be comparatively few. The same concept could be optionally applied to the nodes which run SAS Compute Servers as well, that is, the nodes could be dedicated solely for compute servers and likely would run fewer pods per node than a similarly sized node that's running the SAS Viya microservices infrastructure.

This post won't provide specific recommendations for the number of pods per node - but it does acknowledge that different SAS Viya pods will have different runtime objectives and hence, different resource requirements. That's why we define nodepools for CAS, Compute, Stateless, Stateful, etc.

To get a sizing for your site, contact your SAS account representative or SAS Technical Support to ask for more information from the World-Wide Sizing Team.

For our environment, we've dedicated three nodes to running SAS Compute Servers (plus another five for CAS and then three more for the Viya services infrastructure). To ensure the Compute nodes were as isolated as possible, we chose to taint them (workload.sas.com/class=compute) as is similarly done for the CAS nodes (workload.sas.com/class=cas). Combined with the default tolerations and affinities, this gives us an environment where the supporting hardware can be optimized for the different workloads.

The next question to ask is what happens when we hit a pod limit on a node? If a cluster autoscaler is enabled in your environment, then k8s can request more nodes from the infrastructure provider to handle the additional workload. There are limits on this behavior, too, including the minimum number of nodes in a pool to run as well as a maximum allowed to start on demand. The k8s cluster administrator is responsible for ensuring these limits - and other resource constraints - are sufficient to service normal SAS Viya operations.

Next steps

We're really just getting started with fine-tuning our workshop environments in RACE for SAS Viya. While we're not engaged in detailed performance testing, we are looking to understand the various waypoints that need to be navigated in configuring SAS Viya, Kubernetes, the OS, and so on.

For example, we're incorporating an early development release of SAS Enterprise Session Monitoring (a.k.a. ESM) which works with Kubernetes into our environments to gain additional visibility into SAS Viya's operations and resource utilization in our workshops.

We welcome any insights and experience you'd care to share as well. Please comment or otherwise reach out - we look forward to hearing from you.

Find more articles from SAS Global Enablement and Learning here.