The Evolution of SAS Workload Management

4 Likes

SAS has long offered the ability to manage the workload of jobs serviced by the SAS runtime. We can look all the way back to the early days of SAS/CONNECT which enables different SAS sessions to communicate and coordinate. From there SAS continued to build on more power and flexibility eventually maturing to provide full grid computing capabilities. With the advent of container technologies and the Kubernetes control plane, now SAS Viya can capitalize on the workload management inherent to those technologies and can extend it even further, bringing the grid computing capabilities that SAS 9 customers enjoy into the modern SAS Viya world.

Let's take a quick look back at how SAS technologies have evolved to gain some better understanding and appreciation of the newest capabilities offered by SAS Workload Management in SAS Viya today.

Workload Management Concepts

At its essence, workload management ensures that the task at hand runs at the right time in the right place. There are many factors which can influence this, but we can distill them down to three fundamentals:

Required resources: Ensuring SAS jobs run with sufficient CPU, RAM, network, disk, and other physical hardware components.
Data access: SAS jobs work on data - and that data might only be accessible for processing on specific hosts
Service level: Users need their SAS jobs to complete in a timely manner - some more than others. So jobs might run interactively or in batch, immediately with high priority or later if low priority.

Other factors are often extensions of these three in some way.

SAS/CONNECT

When SAS/CONNECT was introduced, it's primary task was to allow two different instances of SAS to communicate with each other. In particular, this allowed SAS users to direct workload to run where it was most appropriate. The ability to submit SAS program code to a remote host, get results back, and share data between them was a crucial step in creating an environment where it's possible to manage the work across multiple machines.

Over time, SAS/CONNECT was enhanced to provide more functionality - such as starting up new SAS sessions on the fly, either on the same machine or remote on a different host. This was further refined to introduce multi-threaded (or parallel) jobs. The MP CONNECT feature extended the core SAS/CONNECT functionality to include both synchronous and asynchronous job execution as well as the coordination needed to run jobs as a logical pipeline. SAS Code Analyzer was introduced to automatically scan your program code and provide a converted program that would use MP CONNECT functionality to run as multiple jobs in parallel.

This all worked great in environments which were effectively run by a single user (or with really tight user coordination) because that user could be so intimately familiar with the processes as to craft the parallel jobs to run when and where they worked best. That's because all of the dependencies were essentially in the code. But for ad-hoc multi-user environments where each person is performing their own individual tasks without formal coordination, then bottlenecks soon emerged as some hosts (and data sources) were overwhelmed.

SAS Integration Technologies

SAS Integration Technologies was the next generation of coordinated services startup and execution, introducing the SAS Integrated Object Model on which various SAS service daemons operated. For example, the SAS Object Spawner (objspawn) is a daemon process that runs quietly on one or more hosts where SAS program code might need an arbitrary runtime environment. And the objspawn could then instantiate a new SAS runtime as a SAS Workspace Server, SAS Stored Process Server, or SAS Pooled Workspace Server. (Sidebar: SAS OLAP Server was built using IOM technology, but didn't rely on the objspawn for instantiation.) Some instances were dedicated to a specific task/session and others might be shared over time. Of course, there was built-in coordination configuration to help manage it all.

One aspect of that coordination were the load balancing algorithms:

Cost: When a new client requests a connection, the objspawn load balancing redirects the client to the connection with the lowest cost (an arbitrary value, not money) on the machine with the lowest total cost. The objective is to place new requests on machines with the least workload.
Response Time: The objspawn maintains an ordered list of machines and their response times, directing requests to the machine with the fastest response time. The objective is to place new requests on machines which appear to be most responsive.
Most Recently Used: The objspawn directs requests to the machine that received the previous request with the objective to reduce waits for a SAS environment to start up.
Least Recently Used: The objspawn directs requests to the machine that hasn't received a request after any of the others with the objective to spread out client load across as many hosts as possible.

There are three challenges with these load balancing algorithms. The first is that they're mostly abstractions that aren't based on physical attributes of the systems in play. They can be tweaked to approximate the desired goal to a degree, but rarely could be perfected. The second challenge is that each load balancing algorithm could only work with certain IOM servers. For example, the Cost algorithm works for stored process and workspace servers, but not pooled workspace servers. Similarly, the MRU algorithm only applies to pooled workspace servers. And the third challenge is that these algorithms were effectively static in their configuration - unable to change to suit increasingly complex decision paths on the fly.

SAS Grid Manager

The SAS Grid Manager offering was a major leap forward to address the challenges of the increasingly complex landscape of SAS environments. Initially built and modeled after Platform LSF and related software (which was later acquired by IBM and eventually rebranded as Spectrum Scale), SAS Grid Manager brought the concepts of queues, prioritization, preemption, and live resource monitoring to enable SAS to run on very large hosting platforms, with nigh endless scalability, and providing availability protection, too. Further SAS Grid Manager was architected such that it could be configured to run atop any supported grid provider. Initially, the only grid provider was Platform LSF, but eventually Hadoop YARN was added to the mix. And then SAS built its own grid provider layer called SAS Workload Orchestrator. SAS Workload Orchestrator distilled the relevant-to-SAS concepts of Platform LSF, essentially providing everything SAS needs, and nothing it doesn't. (sidebar: Customers could integrate SAS Grid Manager for Platform into their existing enterprise deployments of Platform LSF to save on license costs and manage even deeper workload coordination). The added option to employ SAS Workload Orchestrator as the grid provider (instead of Platform LSF) removes the complexity of deploying a major third-party software platform. Essentially, installing your selected SAS software brings all the pieces along that are needed for full grid functionality.

Kubernetes

Containers, as popularized by Docker, can be thought of as stripped-down virtual machines where each is dedicated to running in support of a singular purpose. The idea was to realize many of the benefits of VMs without having the burden of a thick virtualization layer and associated hypervisor management. Even so, it didn't take long for Google to realize that they needed a robust management capability for the thousands and thousands of containers they'd need to run their infrastructure - eventually releasing Kubernetes to the world as an open-source project.

Kubernetes oversees the lifecycle of containers (running inside pods). And much like the workload management concepts discussed above, it is able to distribute pods across hosts to balance resource utilization, start new pods and destroy old ones to provide both scalability and availability, as well as ensure everything inside the cluster can be found wherever it's running. From this perspective, SAS Viya running on Kubernetes already enjoys really great workload management far better than the old SAS Integration Technology load balancing algorithms.

SAS Workload Management

SAS Workload Management is an add-on product which customers can include with any of the SAS Viya offerings. It brings many of the same SAS Grid Manager features forward into SAS Viya where they're applied to running the SAS Compute Servers (i.e. the classic SAS runtime). So while Kubernetes does provide many useful features to balance work across the cluster, SAS Workload Management provides queues, prioritization, and preemption as well along with dashboards to configure and monitor those activities.

Furthermore, end-users who are familiar with using clients to SAS Grid Manager (including SAS program code, point-and-click UI, as well as command-line utilities) will find that most aspects of SAS Viya with SAS Workload Management work with a syntax and structure they're already accustomed to. Administrators of SAS software will enjoy the natively integrated dashboards to manage and monitor the environment - a big improvement over early SAS Grid Manager configuration management and monitoring.

Come Full Circle

This year will be my 25th year at SAS - and SAS/CONNECT was already well established long before I came on board. Even so, it's still one of the critical components that enable the SAS runtime to execute in coordinated fashion across multiple hosts. SAS program code still relies on the tried-and-true SAS/CONNECT infrastructure and MP CONNECT syntax to define ad-hoc SAS jobs and harmonize their execution on multiple hosts, following their dependencies, and keeping their processing in sync. When SAS Workload Management is enabled in SAS Viya on Kubernetes, the foresight of the original SAS/CONNECT developers blends together with the newest cloud technologies to deliver the most robust and scalable implementation of SAS yet.

Find more articles from SAS Global Enablement and Learning here.