BookmarkSubscribeRSS Feed

Scaling CAS part 2: Increase worker size with IAC

Started ‎09-18-2023 by
Modified ‎09-18-2023 by
Views 362

Carrying on with our larger series about scalability of SAS Viya, this post continues looking at scaling the SAS Cloud Analytic Services. In part 1, we looked specifically at using the viya4-deployment project to change the number of workers for MPP CAS. In this post, we'll consider a change that requires a planned outage by scaling the host machines for CAS to run on larger instance types with more CPU and RAM.

 

Review

 

We've already covered ground regarding a number of salient points to consider in the previous post. So please refer back there for more explanation about:

 

  • The IAC and DAC projects
  • The difference between scaling out and scaling up
  • Partnering with the SAS World-Wide Sizings Team for specific machine guidance
  • How to use the DAC to scale out the number of MPP CAS workers

 

We'll continue to refer to this example deployment of SAS Viya:

01_RC_iac-dac-sasviya-nodepools-1024x872.png

Select any image to see a larger version.
Mobile users: If you do not see this image, scroll to the bottom of the page and select the "Full" version of this post.

 

We're still looking at CAS on the left of the illustration, but in this post we will scale up the CAS server by increasing the size of the instance types used for the CAS node pool to give them more CPU, RAM, and other resources as defined by the infrastructure provider.

 

IAC = Terraform (for this blog post)

 

The IAC project relies on a number of utilities, some of which are specific to the intended cloud provider used by your site. But in all cases, the IAC uses Terraform for provisioning the infrastructure components. So while this blog post highlights the Infrastructure as Code project, the most critical steps are performed using Terraform.

 

The point is, if your site uses Terraform (but not the IAC), then this post is still relevant to you.

 

Scaling up CAS with the IAC

 

Changing a virtual machine's instance type will necessitate shutting down the VM and restarting it on the new hardware. In this case, we have a node pool of four VMs for the CAS server. By definition, all of the hosts in a node pool rely on the same instance type. So by changing the instance type associated with the node pool, we will be required to restart all four VMs for CAS.

 

The CAS server is designed to be scalable and highly available - but it cannot tolerate this level of massive change at the lowest level without taking excessive care. It's far easier to plan an outage to take SAS Viya offline so that we can make this change quickly and efficiently.

 

That said, we can use the IAC project to change the instance type for the CAS node pool. This works whether we're running CAS in SMP mode or in MPP mode. And it's pretty easy...

 

What's changing?

 

The IAC can be configured to deploy AWS node groups (a.k.a., cluster node pools) for SAS Viya:

 

## Cluster Node Pools config
node_pools = {
  cas = {
    "vm_type" = "m5d.2xlarge"
    "cpu_type"     = "AL2_x86_64"
    "os_disk_type" = "gp2"
    "os_disk_size" = 200
    "os_disk_iops" = 0
    "min_nodes" = 4
    "max_nodes" = 5
    "node_taints" = ["workload.sas.com/class=cas:NoSchedule"]
    "node_labels" = {
      "workload.sas.com/class" = "cas"
    }
    "custom_data" = ""
    "metadata_http_endpoint"               = "enabled"
    "metadata_http_tokens"                 = "required"
    "metadata_http_put_response_hop_limit" = 1
  },

 

In the example for this blog post, the vm_type specified at initial deployment of SAS Viya was m5d.2xlarge. Now we want to increase the number of CPU and amount of RAM on those CAS host machines.

 

Stop SAS Viya

 

Changing the instance type of a node pool will require shutting down the old nodes and bringing up new ones with the desired instance type. The CAS server doesn't tolerate this kind of change and should be taken offline first. Might as well take all of SAS Viya down with it to be safe.

 

  1. Backup SAS Viya (just in case):
    See Retaining your SAS Viya Backup
  2. Take SAS Viya offline:
    Execute the sas-stop-all cron job:

 

NS={{ your_SAS_Viya_namespace }}
 
kubectl create job sas-stop-all-`date +%s` --from cronjobs/sas-stop-all -n ${NS}

 

Change the instance type for CAS in the tfvars file

 

We saw an example of the CAS node pool definition in the configuration file for Terraform above.

 

Logon to the host machine where the initial deployment of SAS Viya was managed so we can use the IAC tools and configuration there. Edit the tfvars file and specify the new desired instance type for the CAS node pool. In this example, we'll request:

 

"vm_type" = "m5d.4xlarge"

 

Note: It's interesting that cloud providers differentiate cost based on CPU count within a family. If you consider that an m5d.4xlarge provides twice the CPU and RAM of a m5d.2xlarge and that it also costs twice as much, then there's a lot of incentive to consider scaling out instead. Another point to consider is that the networking doesn't keep pace with CPU - so running more smaller hosts rather than fewer larger hosts likely has a cumulative network throughput benefit. In addition to those points, the prior blog post demonstrated that scaling out MPP CAS can be done without service interruption of SAS Viya.

 

Make it happen

 

To put this configuration change into effect and specify a larger instance type for the CAS server, we first need to direct Terraform to plan the change to the instance type of the CAS node group.

 

$ terraform plan -input=false \
                 -var-file=/path/to/sasviya.tfvars \
                 -state /path/to/sasviya.tfstate \
                 -out /path/to/sasviya.tfplan

 

If that step completes successfully, then it should return a lot of output and near the end say something similar to:

 

Plan: 2 to add, 2 to change, 1 to destroy.

 

Next, we can direct Terraform to apply the updated plan to the cluster:

 

$ terraform apply -state /path/to/sasviya.tfstate \
                   "/path/to/sasviya.tfplan"

 

This takes several minutes to complete.

 

Monitor the newly instantiated nodes for the CAS node pool as they come online and reach Ready state in your environment.

 

Resume SAS Viya operations

 

After the node pool has been updated with hosts using the new instance type, then return SAS Viya to normal operations: Execute the sas-start-all cron job:

 

# NS={{ your_SAS_Viya_namespace }}
 
kubectl create job sas-start-all-`date +%s` --from cronjobs/sas-start-all -n ${NS}

 

After the CAS server resumes its normal operation state, then reload any desired tables, formats, and other data as needed.

 

Alternatives

 

Using the IAC project is optional. Ultimately, the customer site is responsible for provisioning and maintaining the hardware environment for SAS Viya. Some sites will already have mature, standardized practices using their own tools that don't rely on Terraform.

 

As background for this post, I ran through two alternative approaches in the Amazon Web Services environment for scaling up MPP CAS:

 

  • Using the AWS Console to point-and-click in a web browser (doc)
  • Using the AWS CLI utility for a programming-based devops approach (doc)

 

If those are of interest to you, drop me a line and I'm happy to share my notes.

 

Looking for more information?

 

If you're looking for more details about scaling out instead of (or also with) scaling up, then look back to Part 1 of this series where we show how the DAC project can change the number of host machines running the CAS server.

 

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎09-18-2023 01:50 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags