Running SAS Viya in Upstream Open source Kubernetes – part 2

1 Like

Since June 2022 (stable 2022.1.2), customers can deploy SAS Viya in their own "on-premise" Opensource Kubernetes cluster and benefit from the standard SAS support. Using the SAS Viya 4 Infrastructure as Code (IaC) for Open Source Kubernetes they can automate the provisioning of their Kubernetes cluster.

In this second part of the “Viya in Open source Kubernetes blog”, we want to review the requirements for the IaC; and see how to configure and execute it for a “bare-metal” deployment.

IaC for Open Source Kubernetes requirements

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

The IaC tool is provided by SAS as a GitHub project (sassoftware/viya4-iac-k8s) to automate the installation of an open-source Kubernetes cluster either in a vSphere environment or on a bunch of "bare-os" machines. But it brings its own specific requirements which are not to be confused with the general "Viya 4 on Upstream Open Source Kubernetes platform" requirements themselves.

The use of the IaC tool is NOT mandatory. A customer organization can provide an "on-prem" Kubernetes cluster by its own means. If the environment complies to the documented Upstream Open Source platform requirements such as Calico, containerd, etc…then that’s enough for this customer to benefit from the standard support.

Now, if the customer wants to make their life easier and use the SAS provided IaC tool (viya4-iac-K8s) to automate the Kubernetes installation and ensure that they build a SAS supported open-source Kubernetes environment, there are some extra requirements !

As an example, the IaC tool currently only supports Ubuntu OS based host machines. However a customer who would deploy the upstream open-source K8s using their own infrastructure tooling could use an alternative linux OS for their host machines.

Main requirements (OS, tooling, etc…)

The "SAS Viya 4 Infrastructure as Code (IaC) for Open Source Kubernetes", as of today is only supported on host machines running Ubuntu Linux LTS 20.04. (work is currently in progress to add the support for Ubuntu LTS 22.04)

In addition, the machines need to have a default user account with password-less sudo capabilities. If you have some experience with Viya 3.5 or Ansible in general, this requirement will be familiar to you.

To run the IaC script you also need some tools (Ansible, terraform, docker). As described in the table below (from the first post), the exact requirements depend on the Deployment method (bash or docker) and deployment type ("bare-metal" vs "vSphere").

Another important requirement is that all the machines in the collection must have the same date and time. While not explicitly listed in the project repository this requirement is very important... if one of your node is 2 minutes behind the others, the installation will not succeed. When trying to connect to your API server you’ll see this kind of error message :

Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2022-09-05T09:11:23Z is before 2022-09-05T09:12:15Z

It might seem obvious that all servers are at the same time, however in a VMWare environment, on boot the VM picks up the time from the BIOS (which would be from the VMWare hosts) and real life experience taught us, that the time could drift on specific VMWare hosts...resulting in different date/time on the collection’s machines.

Topology

In terms of required machines, the project documentation also has the following requirements :

If you read these requirements as they are and do the maths, it adds up to a minimum of 11 machines (12 if you are planning to use an external PostgreSQL server on a dedicated machine).

Here is diagram describing this kind of topology with a 4 nodes CAS MPP and 2 compute nodes.

rp_4_Viya-4-Upstream-Opensource-Kubernetes-2048x1113.png

Upstream open-source topology (full)

That’s a lot of machines...so it might be disconcerting to a customer who is just looking at building a small test or POC environment.

Actually there are ways to successfully stand up the Kubernetes cluster with a smaller number of machines.

First, the 3 machines for the Control Plane are only required if you want an HA configuration of the Kubernetes Cluster. If you are planning to build a development, disposable test or POC environment with the IaC tool, choosing 1 machine for your Control Plane is quite sufficient.

Then, it is possible to assign multiple “roles” to the same machines. However in this case you’ll need to be careful, depending on the topology that you define, the tool could fail or you could end up with a deployment where pods can’t be scheduled.

Here is an example of an alternative topology which has a smaller footprint (with the associated diagram).

sasnode01 : jumphost
sasnode02 : K8s Control Plane
sasnode03 : K8s node for System + NFS
sasnode04 : K8s node for CAS (SMP)
sasnode05 : K8s node for Compute
sasnode06 : K8s node for Stateful
sasnode07 : K8s node for Stateless

Upstream open-source topology (small)

Such an optimization of the number of host machines is not recommended for a production environment where a proper design and topology must be established. For example co-locating the NFS server on a K8s node is generally not a good idea as they might be in conflict for the storage resources.

Network

The main network requirements are listed in the IaC README file:

The first two items are very standard : we need our target machines (Cluster nodes, jumphost, NFS server and external PostgreSQL) to be on the same “routable” network and each machine needs to be able to talk to the others using a fix IP address.

The last item for the “floating IP addresses” is a little bit more unusual so let’s take a step back and explain why these requirements are there 🙂

When you deploy Kubernetes in the Public Cloud, the required load-balancer components are automatically and dynamically provisioned by the Cloud provider with the associated external IP addresses. Those external IP addresses can then be used to connect to the Kubernetes API or to access to the applications running inside the cluster.

In our "on-prem" bare-metal deployment (where we cannot leverage this kind of Public Cloud integration), we use kube-vip and a pool of pre-defined Virtual IP addresses (also known as “Floating” IP addresses) to provide the same capability.

The VIP addresses act just like the load balancer components a Cloud provider has in place when you are requesting an external IP for your LoadBalancer service.

In the IaC ansible "vars" file, we must provide information about the Virtual IP addresses, that is used for:

The Kubernetes Control Plane access (whether there is one or 3 CP nodes)
The Ingress Controller (NGINX) load-balancer service acting as the entry point for external HTTP/s request to access the web apps and services running in the cluster.
Other load-balancers services that could be required (for CAS or SAS/CONNECT access on non-HTTP/s ports)

At the end, the environment will look like in this diagram.

rp_7_Viya-4-Upstream-Opensource-Kubernetes-Upstream-K8s-overall-arch-1-2048x1381.png

Understanding the IaC tool

Welcome back Ansible !

The good news is that the tool is based on Ansible and Ansible is not something completely new to us 😊.

Ansible has been around for long time and is a very popular tool to automate computers configuration and management. But Ansible is also what SAS consultants have been using for years to deploy the previous version of the Viya platform (Viya 3.5).

If you know Ansible, then you know that it works with 2 things : the "inventory" file and the "vars" file.

You can find samples of both files in the viya4-iac-k8s repository, but basically the inventory file is where you implement the topology and the vars file is where you provide the configuration values.

Inventory file

The schema below shows how to implement the “small” topology discussed before in the inventory file (with 7 machines).

Note that the repository provided sample assumes that your deployment is using an external PostgreSQL server(s) for the SAS infrastructure Data Server.

So, if you want to deploy Viya with the OOTB internal “sas-crunchy” PostgreSQL server, then you won’t need the corresponding “postgres” host groups and you must comment them in your inventory file (as in the screenshot above).

Note that a GitHub issue has been opened to request an inventory sample with an internal PostgreSQL server configuration.

Ansible vars file

A sample file is also provided for the ansible-vars.yaml.

In addition, the CONFIG-VARS page of the repository contains the variables description and default values.

Most of the variables are easy to understand and fulfill.

However there are some variables that needs a little bit more effort and thinking…One of the trickiest part of the tool configuration is probably to find the values for the following variables.

The variables values are used to meet the network requirements described above.

The kubernetes_vip_ip is an IP address for the entry point of the Kubernetes cluster itself. As part of the deployment, kube-vip creates a load-balancer component with the “user provided” IP addresses to balance K8s API requests across the Control plane nodes.

The kubernetes_vip_cloud_provider_range is the range of IP's that can be assigned to new LoadBalancer services created inside your cluster. The range value is not expressed with a CIDR notation, but with the "firstIP-lastIP" format, for example : "10.96.18.5-10.96.18.8". Typically a range of 3 or 4 IP addresses should be enough : one for the web application, one for CAS, one for CONNECT, one for a specific service).

The kubernetes_vip_interface is the machine's network interface corresponding to our Virtual IP address (something like "eth0" or "ens32"). But note that in the latest IaC version, this parameter has become optional.

The kubernetes_vip_loadbalanced_dns is the DNS alias that corresponds to the IP address used to access the K8s Cluster. For an “on-prem” deployment, the corresponding A and CNAMES entries are typically created in the corporate DNS.

So how do you find the value to use for the VIP_IP and the VIP_CLOUD_PROVIDER_RANGE ?

The value is "user defined" and should not be part of any DHCP range or be assigned to any physical machine. Typically, you will ask your customer to provide a set of IP addresses on the VLAN that are not already associated to any machines and use the values of this IP addresses in your ansible-vars.yaml file.

As an example, in our RACE hands-on, we search for IP addresses that are available outside of the reserved range of IP addresses used for the RACE.EXNET hosts.

IMPORTANT: The DNS name used for the kubernetes_vip_cloud_provider_range value must exist and resolve to the IP address used for the kubernetes_vip_ip vaue before you run the “oss-k8.sh install” command

In addition, as explained in the first part of my post, for the Calico setup, we also need specify distinct CIDR IP ranges for pods and services.

But as you can see in the CONFIG-VARS page there are default values that you can use for them (without having anything specific to do) or adapt depending on the customer specific requirements.

Conclusion

It is already a pretty long and complex post with a lot of technical concepts to digest 😊, so I’ll stop here for today.

In our 3^rd and last post of this series, we’ll go a little bit more in the technical details, discuss how to troubleshoot the IaC issues using ansible, but we’ll also have a look at what’s going on during the Kubernetes software installation, then we’ll share some first “hands-on” experience feedback from our utilization of the tool.