Carrying on with our larger series about scalability of SAS Viya, this post continues looking at scaling the SAS Cloud Analytic Services. In part 1, we looked specifically at using the viya4-deployment project to change the number of workers for MPP CAS. In this post, we'll consider a change that requires a planned outage by scaling the host machines for CAS to run on larger instance types with more CPU and RAM.
We've already covered ground regarding a number of salient points to consider in the previous post. So please refer back there for more explanation about:
We'll continue to refer to this example deployment of SAS Viya:
Select any image to see a larger version.
Mobile users: If you do not see this image, scroll to the bottom of the page and select the "Full" version of this post.
We're still looking at CAS on the left of the illustration, but in this post we will scale up the CAS server by increasing the size of the instance types used for the CAS node pool to give them more CPU, RAM, and other resources as defined by the infrastructure provider.
The IAC project relies on a number of utilities, some of which are specific to the intended cloud provider used by your site. But in all cases, the IAC uses Terraform for provisioning the infrastructure components. So while this blog post highlights the Infrastructure as Code project, the most critical steps are performed using Terraform.
The point is, if your site uses Terraform (but not the IAC), then this post is still relevant to you.
Changing a virtual machine's instance type will necessitate shutting down the VM and restarting it on the new hardware. In this case, we have a node pool of four VMs for the CAS server. By definition, all of the hosts in a node pool rely on the same instance type. So by changing the instance type associated with the node pool, we will be required to restart all four VMs for CAS.
The CAS server is designed to be scalable and highly available - but it cannot tolerate this level of massive change at the lowest level without taking excessive care. It's far easier to plan an outage to take SAS Viya offline so that we can make this change quickly and efficiently.
That said, we can use the IAC project to change the instance type for the CAS node pool. This works whether we're running CAS in SMP mode or in MPP mode. And it's pretty easy...
The IAC can be configured to deploy AWS node groups (a.k.a., cluster node pools) for SAS Viya:
## Cluster Node Pools config
node_pools = {
cas = {
"vm_type" = "m5d.2xlarge"
"cpu_type" = "AL2_x86_64"
"os_disk_type" = "gp2"
"os_disk_size" = 200
"os_disk_iops" = 0
"min_nodes" = 4
"max_nodes" = 5
"node_taints" = ["workload.sas.com/class=cas:NoSchedule"]
"node_labels" = {
"workload.sas.com/class" = "cas"
}
"custom_data" = ""
"metadata_http_endpoint" = "enabled"
"metadata_http_tokens" = "required"
"metadata_http_put_response_hop_limit" = 1
},
In the example for this blog post, the vm_type specified at initial deployment of SAS Viya was m5d.2xlarge
. Now we want to increase the number of CPU and amount of RAM on those CAS host machines.
Changing the instance type of a node pool will require shutting down the old nodes and bringing up new ones with the desired instance type. The CAS server doesn't tolerate this kind of change and should be taken offline first. Might as well take all of SAS Viya down with it to be safe.
sas-stop-all
cron job:
NS={{ your_SAS_Viya_namespace }}
kubectl create job sas-stop-all-`date +%s` --from cronjobs/sas-stop-all -n ${NS}
We saw an example of the CAS node pool definition in the configuration file for Terraform above.
Logon to the host machine where the initial deployment of SAS Viya was managed so we can use the IAC tools and configuration there. Edit the tfvars file and specify the new desired instance type for the CAS node pool. In this example, we'll request:
"vm_type" = "m5d.4xlarge"
Note: It's interesting that cloud providers differentiate cost based on CPU count within a family. If you consider that an m5d.4xlarge provides twice the CPU and RAM of a m5d.2xlarge and that it also costs twice as much, then there's a lot of incentive to consider scaling out instead. Another point to consider is that the networking doesn't keep pace with CPU - so running more smaller hosts rather than fewer larger hosts likely has a cumulative network throughput benefit. In addition to those points, the prior blog post demonstrated that scaling out MPP CAS can be done without service interruption of SAS Viya.
To put this configuration change into effect and specify a larger instance type for the CAS server, we first need to direct Terraform to plan the change to the instance type of the CAS node group.
$ terraform plan -input=false \
-var-file=/path/to/sasviya.tfvars \
-state /path/to/sasviya.tfstate \
-out /path/to/sasviya.tfplan
If that step completes successfully, then it should return a lot of output and near the end say something similar to:
Plan: 2 to add, 2 to change, 1 to destroy.
Next, we can direct Terraform to apply the updated plan to the cluster:
$ terraform apply -state /path/to/sasviya.tfstate \
"/path/to/sasviya.tfplan"
This takes several minutes to complete.
Monitor the newly instantiated nodes for the CAS node pool as they come online and reach Ready
state in your environment.
After the node pool has been updated with hosts using the new instance type, then return SAS Viya to normal operations: Execute the sas-start-all
cron job:
# NS={{ your_SAS_Viya_namespace }}
kubectl create job sas-start-all-`date +%s` --from cronjobs/sas-start-all -n ${NS}
After the CAS server resumes its normal operation state, then reload any desired tables, formats, and other data as needed.
Using the IAC project is optional. Ultimately, the customer site is responsible for provisioning and maintaining the hardware environment for SAS Viya. Some sites will already have mature, standardized practices using their own tools that don't rely on Terraform.
As background for this post, I ran through two alternative approaches in the Amazon Web Services environment for scaling up MPP CAS:
If those are of interest to you, drop me a line and I'm happy to share my notes.
If you're looking for more details about scaling out instead of (or also with) scaling up, then look back to Part 1 of this series where we show how the DAC project can change the number of host machines running the CAS server.
Find more articles from SAS Global Enablement and Learning here.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.