Scaling along a continuous spectrum for the SAS Viya Platform

2 Likes

The SAS Viya Platform provides a powerful analytics environment with a rich set of features and seemingly infinite combination of deployment options. Given a detailed understanding of the workload at hand, we can right-size the SAS Viya Platform so that it is scaled to handle almost any challenge for any number of users acting on any amount of data.

While we can stick the landing to right-size SAS Viya for any given problem, there are some considerations to weigh when attempting to slide along the spectrum of possible scalability options. We cannot simply twist a knob to scale a SAS Viya deployment from zero to infinity. Along the way, crossing the Rubicon becomes a necessity. That is, we will need implement some major changes to the architecture and infrastructure at key points. If the change is significant enough, it may even require a fresh deployment of SAS Viya with a migration from the old environment to the new.

This blog post introduces a new series where we discuss the potential challenges of moving the SAS Viya Platform along the spectrum of scalability.

World-Wide Sizings

For customers looking for help with getting the right hardware to run their SAS Viya workloads, the SAS account representative can begin an engagement with the World-Wide Sizings Team. To do this well, the sizing effort is necessarily a collaboration between SAS and the customer. The Sizings team gets the ball rolling by asking (and answering) some critical questions as part of a collaboration to set expectations and devise a plan of attack which ensure scalability goals are met.

Having an understanding of where the customer wants to go with SAS Viya coupled with detailed knowledge of how the relevant technologies scale is an important step along the path to success.

The Radio Analogy

To help explain the idea of scaling along a continuous spectrum, let's consider the AM/FM radio. Tuning the radio can be done by scanning along the adjacent frequencies - like when turning the knob and listening for breaks in the static. Or if you have stations preset, you can jump directly to the frequency of your desired station.

rc_1_800px-1978_AMC_Matador_sedan_red_NC_detail_of_factory_AM-FM-stereo-8-track_unit.png

CZmarlin, CC BY-SA 3.0, via Wikimedia Commons

With SAS Viya, we can definitely jump to a desired point of scalability (like the radio's preset). But when it comes to elastic deployments that can automatically scale up or down, that's more like twisting the knob on an old radio to move along each of the frequencies one after the other.

FM can transmit 15× more information than AM. And it's important to note that no matter how far you turn the knob on the AM band, you'll never get to FM. For SAS Viya deployments in the cloud, there are situations where we can slide along the elastic spectrum to a point and then from there, we need to take a more drastic action to get to the next level of performance (like switching from AM to FM).

To take this analogy to its extreme, consider replacing a 1980-era car's stereo with a modern bluetooth streaming unit that also includes satellite radio and a DVD player. All of these technologies help deliver music to your ears, but getting the song you want to hear could range anywhere from a twist of the dial to switching bands to changing source input… all the way to tearing out the old unit and putting in something completely new.

Let's look at the areas of scalability spectrum and the jump points between them that we should consider when planning the elasticity of a SAS Viya deployment.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

An example of interrupted spectrum of scalability

There's no official explanation of a continuous spectrum of scalability, but it can be useful to identify where scaling the system might encounter challenges that need additional efforts.

These days, we primarily run SAS software on virtual machines. This isn't required - but yields numerous benefits in terms of repeatability, resilience, and re-use. VM images can be saved and then deployed many times over. When running in the cloud, we can place a VM image on a machine instance with the CPU, RAM, and other resource attributes we desire. If that instance size is too small, it's relatively easy to scale up to run on a new, larger instance with more CPU, RAM, and other resource attributes instead.

While this is simple in concept, there's an outage that must considered. The underlying instance isn't resized on the fly with more CPU, RAM, and all. No, we must start up a brand new instance and have it run the VM image with our software. In this way it's scalable, but not continuous - there's an interruption to service.

An example of continuous spectrum of scalability

The SAS Compute Server is offered as part of the SAS Viya Platform. It provides a way for users to each have their own dedicated runtime for processing SAS code. Users (and SAS developers) can even elect to spin up additional SAS Compute Servers on demand for additional concurrent processing power. The SAS Viya Platform (and SAS 9 before that) supports the ability to run SAS Compute Servers across multiple host machines.

So let's say we've defined a node pool in Kubernetes which is labeled exclusively for use by SAS Compute Servers (not required, but often desirable). That node pool is currently running on 3 machine instances in the cloud. We can provision additional nodes to that pool and when they're ready, then SAS Viya will work with Kubernetes to schedule new SAS Compute Servers to run there, too. If the system quiets down, we can even scale down and decommission under-utilized nodes.

This kind of scalability takes some configuration to employ, but it can be fully automated with a Kubernetes cluster autoscaler and with new features introduced with the SAS Workload Manager in the stable-2023.05 release. Regardless of the automation - even if we manually scale the node pool up/down - the point is that the scalability from the users' point-of-view is continuous. There's no interruption in service when we change the resources available for the SAS Compute Server by changing the number of nodes it can run on.

When continuous runs into interrupted

The SAS Viya Platform is designed to maximize scalability. Even so, it still must work within the constraints of Kubernetes, the cloud provider, and third-party software on which it relies. There are several areas where we hit the logical limit of continuous scalability and must deal with an interruption to achieve the scalability goal (like reaching the end of the AM dial and switching to the FM band).

Examples of this include:

The deployment of the SAS Viya Platform defaults to working with an "internal" instance of Crunchy Postgres. For improved scalability, efficiency, availability, and/or standardization, a customer site may prefer to use an "external" Postgres server, like the managed solutions offered by the cloud providers. While SAS Viya can run with either type, changing from one Postgres to another will require a planned outage of the entire SAS Viya cluster.
SAS Cloud Analytic Services can be deployed to run either SMP (as a single host) or MPP (distributed to run across multiple hosts). After that deployment, an SMP CAS server can be converted to operate as MPP, however that requires a restart of your CAS server and results in the termination of all active connections and sessions as well as the loss of any in-memory data.
A node pool is a group of Kubernetes nodes that share the same instance type, i.e., physical configuration (CPU, RAM, storage, networking, and OS) as well as Kubernetes configuration (like maximum number of pods). While the node pool itself can be modified on the fly to run with more or less nodes, to effectively change the instance type (to get a different machine) requires defining a new node pool, draining the old node pool, and confirming the workload migrates successfully. Some SAS Viya components, like CAS, cannot make this move "live", requiring an outage to complete.

The impact of these changes can range from a short-term outage of a select set of services to the full redeployment of the SAS Viya Platform. Especially where external services are involved, changes to the architecture can take time and effort to implement. Also, the additional resource costs can be significant as well.

Where is this going?

This blog post introduces a new series to take a closer look at some of the scalability waypoints where an interruption to service might be needed to gain additional processing power. From a customer perspective, we want to ensure that if their goal is to start small and grow over time, then the ideal scenario won't start too small nor too large. Instead we want to start at a size where scaling up over time can be as continuous as possible - avoiding major deployment modifications which interrupt operations. When we consider scaling a SAS Viya Platform deployment from a small to larger size, we need to be aware of where these possible challenges lie so we can avoid them with sufficient planning, or if that's not possible, then to handle them in a timely and efficient manner.

Find more articles from SAS Global Enablement and Learning here.