SAS Viya simplified deployment patterns

2 Likes

In this article I would like to discuss simplified deployment patterns and share some videos that I have previously published. In two previous articles I discussed SAS Viya deployment topologies. See Creating custom SAS Viya topologies – realizing the workload placement plan and Creating custom SAS Viya topologies – Part 2 (using custom node pools for the compute pods)

In this post I want to discuss two alternatives to using the default approach which employs five node pools: cas, stateless, stateful, compute and connect.

So, will a simplified deployment topology lower or reduce the infrastructure costs?

I could drive you all crazy by just saying “IT DEPENDS”! But I think this warrants a deeper look.

There are many non-functional requirements that can have an impact on the infrastructure requirements. For example, performance, availability, and security. Of these the performance and availability requirements are architecturally significant, they can have a significant impact of the infrastructure costs. So, it is important that we understand the requirements.

Understand the requirements

There is an adage in computing, that the last 9th of availability that you implement will be the most expensive IT spend. Therefore, understanding your organizations Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) is key to defining the right approach, getting the right deployment design and to avoid unnecessary infrastructure costs.

When we think about performance, it is important to understand that not every organization or business processing needs blinding performance, needs a “super car” (a McLaren or Ferrari). The reality is organizations are after “acceptable performance within the constraints”. The biggest constraint is usually the budget, cost $$$

So, it is critical that you understand the availability and performance requirements to define the right deployment topology. I'm sorry if I'm preaching to the converted! 🙂

Comparing SAS Viya 3.x and SAS Viya 4 deployments

Before we get into the details of the SAS Viya topologies, let’s take a moment to set a baseline for the discussion. Below is a simple analogy between SAS Viya 3.x nodes (servers) and Kubernetes node pools.

A node pool can be compared to a single machine (physical or virtual) or multi-machine host group in a Bare OS SAS Viya 3.x deployment.

In a SAS Viya 3.x Bare OS deployment you could use one machine for the SAS Cloud Analytic Services (CAS) server and one machine for the rest of the SAS Viya services, a 2 machine (server) deployment. But larger deployments would have multiple servers. For example, you might have 5 machines for MPP CAS, 2 machines for the programming run-time and 3 machines to provide high availability for the infrastructure servers and microservices. Thus giving a total of 10 machines.

It is similar with the node pools; you can have just 2 node pools (CAS and general) or use the 5 node pool option to provide different node types for each type of SAS Viya workload.

However, the key difference between a node pool and the machines in a Viya 3 deployment is that a node pool is a scalable template for a VM instance. You define a Node template (instance type, label, taint, with or without GPU, etc…) and then you can make it scalable by defining a minimum and maximum number of nodes (which are VM instances) in the node pool.

So, even if you start with a 2 node pool topology, it can still be scaled in terms of the number of nodes (if needed). But remember, all nodes within a node pool have the same specification and attributes (instance type, storage mounts, and Kubernetes labels and taints).

Using simplified topologies

Coming back to the question, will a simplified deployment topology lower or reduce the infrastructure costs?

It can do, but you can’t just think about the node pools in isolation. As one of the key benefits of using multiple node pools is that the compute instance types can be optimized to the processing needs and workload. So, the question could be rephrased as

“How many node pools should I have”?
But the question could also be “How many nodes do I need in the node pool”?

In the video I discuss two deployment patterns:

Using two node pools for the SAS Viya components (pods), and
Using three node pools for the SAS Viya components (pods).

In both patterns there is still the default or system node pool for non-Viya services. For example, the ingress controller, cert-manager, or the monitoring and logging components. The examples show the use of a SMP CAS Server, but this could also be for a MPP CAS Server.

Pattern 1: Using two node pools

This deployment pattern has a ‘General’ node pool, which is for everything other than CAS, and a CAS node pool.

This pattern would be a good choice for smaller environments or environments with a small programming (SAS Compute Server) workload, or where there isn’t the need to dedicate node(s) to the programming workload. This is shown in the image below.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Pattern 2: Using three node pools

The three-node pool deployment pattern provides dedicated resources for both the CAS and programming workloads. The three node pools are:

Services node pool – for the Stateless and Stateful services (application pods)
Compute node pool, and
CAS node pool.

This is shown in the image below.

The Video…

As this has ended up being longer than I had intended, I guess it’s about time I let you watch the video.

Conclusion

In this article and the video, I have focused on node pools and nodes, but there are many things that can affect the infrastructure requirements and hence the costs. Such as the availability and performance requirements.

When running in the Cloud there are many storage options and the choice of storage can also have a significant impact on the infrastructure costs.

While the intent of this is to raise awareness of the many different deployment options and considerations, I have probably only scratched the surface of this topic, and you may have many more questions.

Finally, the SAS Viya 4 Infrastructure as Code (IaC) projects are available on GitHub, see the links below:

SAS Viya 4 Infrastructure as Code (IaC) for Microsoft Azure

SAS Viya 4 Infrastructure as Code (IaC) for Amazon Web Services (AWS)

SAS Viya 4 Infrastructure as Code (IaC) for Google Cloud Platform (GCP)

Below are the other videos in this series:

I hope this is useful and thanks for reading.

Find more articles from SAS Global Enablement and Learning here.

Mahesh_R · ‎03-24-2022

Hi Michael,

I just read through all three posts at once...very interesting and informative. Thank you so much for all the details.

If you can please help me with the use case below:

I'm working on a test deployment of Viya - SAS Visual Data Science on Azure and was wondering which specific instance and storage types/sizes I should select to test both (TWO and THREE Viya node pools) deployment topologies you mentioned to meet the MINIMUM REQUIREMENTS for a FUNCTIONAL VIYA DEPLOYMENT with which I can test some basic use cases. This is just for a pure functional testing and not to support any kind of workload. My goal is to keep the cost lowest possible, while having a successful and functional deployment to work with.

Thanks again,

Mahesh.