Yes, it is, and that's a big deal! Elasticity is crucial for any cloud-deployed software, as it enables resources to scale up during peak periods, allowing more users to access the system, handling larger jobs, or processing more simultaneous tasks. When demand drops, resources scale down, saving costs and optimizing performance.
Since the release of CAS's initial capabilities regarding that matter, Gilles' insightful post, CAS Server Topology Changes and CAS Table Balancing, has been essential reading. Now, let’s dive deeper into what makes CAS elastic, explore real-world use cases, and provide a guide to setting up table rebalancing and what it means for users.
Understanding Elasticity in Computing
As Google Chrome’s AI Overview puts it, "Elastic computing is a system's ability to adjust its resources to match the current demand for processing, memory, and storage."
While this definition implies automated resource adjustments, CAS handles elasticity a bit differently. CAS doesn’t automatically reconfigure resources out of the box. However, with some clever use of SAS Viya monitoring tools and a bit of customization, automation could be within reach. But that’s a story for another day!
So, what’s a good definition of CAS elasticity? Here’s mine: "CAS’s ability to seamlessly accommodate the addition or removal of worker nodes without disrupting user activities, while efficiently utilizing all nodes in the cluster for balanced data distribution and parallel processing."
Let’s put this definition to the test.
What Makes CAS Elastic?
Before diving into specifics, let’s understand the CAS capability that enables us to say it’s elastic.
In a Massively Parallel Processing (MPP) CAS cluster, data is distributed across multiple workers—spread evenly and randomly by default—allowing for maximum parallelization of workloads, with each worker handling an equal share of data.
One of CAS’s powerful features is the ability to add more workers to an existing cluster without stopping the system. This is great. However, if you expand a 3-worker cluster to a 6-worker one, by default, your data remains on the original 3 workers. As a result, only those 3 workers are utilized for data processing, leaving the new 3 workers underused. This means you’re not fully optimizing the expanded CAS environment.
Conversely, if you scale down (say from a 6-worker cluster to a 4-worker cluster—not possible without stopping CAS until recently), any CAS tables without sufficient copies would become unusable unless a crucial operation takes place behind the scenes.
This essential operation is called automatic data redistribution, or table rebalancing.
As the name suggests, this feature automatically redistributes or reshuffles data blocks across the new set of CAS workers whenever an administrator adds or removes workers. This ensures all available workers are engaged, maximizing the performance and efficiency of the CAS environment.
Is Changing the Number of CAS Workers Easy?
You might be wondering, "This all sounds interesting, but how does my SAS admin actually change the number of CAS workers? If it’s as complicated as I’ve heard, it may not be worth it." The good news? It’s simpler than you might think. Changing the number of CAS worker nodes only requires a single kubectl command, as shown below (documented here)
kubectl -n name-of-namespace patch casdeployment name-of-casdeployment --type=json -p='[{"op": "add", "path": "/spec/workers", "value": number-of-worker-nodes}]'
For instance, to set the number of workers to 5, an admin would run:
kubectl -n viya patch casdeployment default --type=json -p='[{"op": "add", "path": "/spec/workers", "value": 5}]'
That’s it! Once this command is executed, CAS will add or remove workers to match the specified number of workers, making the process quick and efficient.
Use Cases for CAS Elasticity
Before we dive into enabling this capability, let’s look at scenarios where CAS elasticity could come into play. We’ve identified four key situations:
A SAS Administrator adds one or more CAS workers to handle a peak in activity.
A SAS Administrator removes one or more CAS workers to return to normal resource levels.
A CAS worker fails (node failure), and Kubernetes automatically restarts a new worker.
A CAS worker fails (node failure), but Kubernetes is unable to restart a replacement.
For scenarios 2 and 4 (reducing the number of workers), no extra configuration is required. As of version LTS 2023.10, CAS automatically rebalances tables on fewer workers out of the box.
In scenario 4, however, ensure you have the COPIES setting (with COPIES ≥ 1) enabled to maintain table availability during node failure, allowing tables to rebalance across the remaining workers.
For scenarios 1 and 3 (increasing the number of workers, although in scenario 3, a worker is lost before being replaced), automatic rebalancing requires additional configuration.
It’s also worth noting that CAS no longer needs to be stopped when adding or removing workers. In short, your SAS Viya platform remains operational by default:
Built-in CAS table rebalancing when workers are lost.
Optional CAS table rebalancing when adding workers (even without rebalancing, tables remain accessible and functional as more workers are added).
That's it for today. In the next parts, we will cover the setup of table rebalancing, illustrate how it works and give some additional considerations.
Thanks for reading!
... View more