BookmarkSubscribeRSS Feed

Another Step Towards the Cloud for SAS/CONNECT

Started ‎11-10-2023 by
Modified ‎11-10-2023 by
Views 1,077

The SAS Viya platform includes by default (and for free!) SAS/CONNECT with all its capabilities. Previous articles, including Moving to the cloud with SAS/CONNECT, have shown SAS/CONNECT evolution to adapt to cloud environments and support our customers in their journey to modern architectures. Innovations keep coming! Starting with SAS Viya version 2023.10, SASCMD sign-ons (a.k.a. MP Connect sign-ons) always start new SAS sessions in new pods by default.

 

Let's start by stepping back a moment to see what MP Connect is, and why this is a welcome change.

 

Distributed computing, with a single machine?

 

Before cloud computing, before any kind of multi-machine distributed computing, SAS customers had no choice but to run their code on a single machine (yes, you can tell I’ve been at SAS for quite some time).

 

SAS capabilities have always been disruptive, even in those limited environments. Multi-process CONNECT (or MP Connect) was conceived to divide time-consuming tasks into multiple units of work and to execute these units of work in parallel. It did that – and still does – by providing a framework that lets you start and coordinate multiple child SAS processes from a controlling parent SAS session. This way you can parallelize code that otherwise runs sequentially, for the purpose of reducing the total elapsed time necessary to execute a particular application.

 

01_ER_pipe.h9.jpg

 An example of a process split into multiple parallel subprocess.

 

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

From machine to pod

 

MP Connect capabilities are automatically available in SAS Viya platforms, because they are provided by the SAS/CONNECT product, which is included by default in every SAS Viya environment. What happens when you move your existing MP Connect code to SAS Viya? Apparently, it's business as usual: child processes are spawned from the parent session to execute part of your code in parallel. But, under the hood, the execution environment is completely different. What used to be running on a physical machine, now runs in Kubernetes, inside a pod.

 

02_ER_20231031_02_MPConnect_Spawned.gif

MP Connect spawning child sessions in a compute pod.

 

Maybe your original machine had 8 cores and 64GB of RAM (not too much, in modern terms) and your code is written to take full advantage of that computing power by spawning 7 additional sessions that use, in total, almost 100% of all CPUs. Kubernetes is built to control the execution environment and prevent a single pod from exhausting the resources of the node where it's running. As soon as your code starts requesting all that CPU power, Kubernetes will throttle down your pod - in a default configuration, down to 2 CPUs maximum. As a result, your existing code will run 4 times slower! It's obvious that this setup does not scale as expected. You could argue that the issue can be easily fixed by configuring the Kubernetes cluster to give more resources to SAS compute and connect pods. Yes, this solution could work, but it suffers of two problems:

 

  1. It simply moves the bar a bit further. Today you want back your 8 CPUs, what will happen when you'll have more data and you will need 16, 32, or more CPUs? A single pod can only scale up to the size of the node where it's running. Scaling up an individual gigantic pod is an anti-pattern in the cloud. Cloud environments are designed to scale out by launching multiple, smaller pods.
  2. The Kubernetes cluster hosting your SAS Viya platform is probably designed to support multiple users with multiple applications. You cannot hog all resources for yourself!

 

What could be a better solution? Obviously, embracing a cloud-native design and scaling out to multiple pods!

 

Scaling out to multiple pods

 

If you've been following so far, you can now understand how the new default behavior is welcome in cloud environments. Starting with SAS Viya 2023.10, every new MP Connect sign-on always starts a child SAS process in its own dedicated pod, embracing cloud-native scalability and elasticity. Kubernetes can spread out the pods on multiple nodes, and, if the cluster is configured for auto-scaling, your limit is the sky... or better, your budget!

 

03_ER_20231031_03_MPConnect_Launched.gif

 MP Connect launching child sessions in dedicated connect pods.

 

It's a matter of choices

 

Every time a new default is introduced in existing environments, it's important to give SAS Administrators to option to embrace this new capability, or to reset the SAS Viya platform to behave just like before. In this case, you can use a new environment variable, SAS_LOCAL_MPCONNECT. When set to true, it re-enables local MP Connect Sign-Ons, i.e. the original functionality of spawning a child session in the same pod where the parent process is running.

 

A SAS Administrator can use SAS Environment Manager to set the SAS_LOCAL_MPCONNECT environment variable to true in sas.compute.server: startup_commands and sas.connect.server: startup_commands configuration instances. In this case, the setting is configured for every SAS compute and connect server running in the environment. 

 

04_ER_20231031_01_ComputeConfigure-1024x576.png

Setting the SAS_LOCAL_MPCONNECT option for all compute server sessions.

 

A more limited scope could be achieved by setting the option case-by-case, as needed. As an end-user, you can add the following line in your code just before submitting the SIGNON statement:

 

options set=SAS_LOCAL_MPCONNECT=true;

 

In this case, only the current execution reverts back to the previous functionality.

 

Why would I revert back?

 

This new capability seems the obvious choice when you are writing code that uses MP Connect in SAS Viya. So why would you revert back to the previous functionality? The most obvious answer is when you are migrating existing code that could be broken by the new behavior. It's easy to understand that if the existing code uses local resources to share data between MP Connect sessions, it cannot work as-is when these sessions are launched in different pods. Here are a couple of examples:

 

  • The code saves datasets in the SASWORK library of one session to retrieve them from another session. SASWORK locations can be shared between SAS processes with the INHERITLIB= option of the SIGNON statement.
  • You are using SASESOCK libraries to stream data between sessions through local pipes.

 

Those issues can be solved by re-architecting your application - for example, by sharing data through Kubernetes volumes mounted on all pod sessions, instead of using local directories. Yet, reverting back to the previous behavior can be an interim step to keep the code running during your migration.

 

Conclusion

 

I am always excited when I see how SAS Viya keeps evolving by embracing existing functionality and integrating it into modern cloud-native architectures. We have seen in this post how traditional MP connect code can be used without sacrificing Kubernetes capabilities to manage resource utilization, provide scalability when required, and ensure user-level process isolation.

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎11-10-2023 02:30 PM
Updated by:
Contributors

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags