What’s different in a SAS Viya Deployment on Google Kubernetes Engine? part 1 : infrastructure

1 Like

Since the new SAS Viya LTS version (2021.1), two new Cloud platforms are supported for the SAS Viya deployment: The Google Cloud Platform and Amazon Web Services (in addition of Azure).

In this post, I’d like to share some of our experience and findings with deployments of SAS Viya in the "Google Kubernetes Engine" (aka "GKE") and highlight some of the specifics of the Google Platform.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Those specifics can affect either the Cluster infrastructure setup or the SAS Viya deployment itself.

In this first part of the article, we will look at some GKE characteristics that makes it “special” and different from Azure AKS, which was the first supported Cloud Managed Kubernetes platform for SAS Viya.

Before digging into the details let’s have a look at this nice diagram from the SAS documentation that summarize what you get in GCP after a SAS Viya deployment.

We can find the usual building blocks as with the other Cloud providers : the jump server, the Cloud services (such as the PostgreSQL Database, the Registry, the File services), and the usual Kubernetes services components like the Node Pools and the Ingress.

So, from this diagram it just seems like another Cloud Provider, simply with different names corresponding to the various GCP specific services (Cloud SQL, Google Container Registry, Google Filestore, etc...).

We will see through both parts of this article, that while this diagram is a great starting point, it does not really show several unique characteristics of the Google Platform that could impact your SAS Viya platform deployment and management.

No Resource Group

In Azure you have the concept of "Resource Group", which allows you to group various Cloud resources together (Network and Storage components, VMs, etc…) in a single group.

It is very handy in terms of costs tracking and clean up automation as you can operate directly at the Resource Group level without having to deal with individual components.

Unfortunately, this concept does not exist in GCP (or even in AWS actually), so basically at the end of the GKE cluster provisioning, your resources will be spread out across various categories of services.

Not having a “single view” of all the resources (created during the cluster provisioning) makes some operations more complicated : for example, cleaning up all the Kubernetes cluster and associated resources...

Kubernetes "Version based" or "Channel" and automated upgrades.

It is important to know and verify the Kubernetes version of the cluster where you are deploying, as you want to make sure that you are installing and running SAS Viya in a supported environment.

For example, when {Viya 4 Reference} SAS Viya 4 was only supported in Azure we have seen that the initial SAS Viya stable versions were not working in Kubernetes 1.19 because of the container run-time change coming with this version (from docker to containerd). If you are using the SAS Viya4-iac-gcp GitHub tool to provision your GKE cluster, you might have noticed that there are 2 options when you set the Kubernetes version (it is different from setting a version within other cloud providers).

You can use the "version based" or the "channel based" option.

If you go with the “version based” option, you need to find out an available version in the GKE cluster zone or region (a gcloud command is provided in the IaC project documentation).

Depending on when you run the command you will not see the same result as the Kubernetes version pace is fast.

However, you might not see the most recent Kubernetes version… - they will only be available if you use the ("Rapid" Channel).

The other option is to pick a “channel”. As explain in the GKE documentation “When you enroll a new cluster in a release channel, Google automatically manages the version and upgrade cadence for the cluster and its node pools.”

There are 3 available channels: By default, new clusters created in GKE are enrolled in the Regular release channel. Each channel offers a trade-off between feature availability and update churn.

The table below comes from the Google documentation.

One important thing to notice is that by default a GKE cluster will be automatically updated to the new available versions of Kubernetes.

The “when” and “how” is probably your next question and the response is in the Google documentation. But no matter if you choose the “version-based” or “channel”, GKE will always try to auto-upgrade to a more recent Kubernetes version when your version is getting too old.

You have ways to define cluster maintenance windows to do it manually before it is too late, but be aware that by default, if you don’t do anything, one day your GKE cluster might go down for a version update, which will also lead to a SAS Viya platform outage.

Understand the location of the Cluster's resources

Each Public cloud provider (such as Amazon, Azure or Google) provides Virtual Infrastructure resources and services in multiple locations and has its own way to organize them, in terms of zones, regions, etc…

It is important to understand the Cloud provider’s Kubernetes infrastructure specific organization and rules in terms of "Control Plane" (running the Kubernetes API server) and "Node pools' nodes" location and how it can impact the SAS Viya Platform availability and performance considerations.

In GCP, you have multiple zones that are attached to many regions across the globe. To summarize, when you create a GKE cluster you define a “location type” which can be:

Single-Zone cluster : Nodes are running in the same zone as the control plane.
Multi-Zonal cluster : The control plane is in one zone and Nodes can run in multiple zones.
Regional cluster : Multiple replicas of the control plane, running in multiple zones within a given region. Nodes in a regional cluster can run in multiple zones or a single zone depending on the configured node locations.

It is represented in the image below (source: Cloud Academy).

The idea is that the choice of the location type is a tradeoff between the customer budget and HA requirements.

The more zones you spread your GKE nodes across, the higher your availability and your bill are. “Single-zone” is the cheapest option but it provides the lowest high availability level : if the zone is down, your SAS Viya platform will not be available.

If you use the viya4-iac-gcp GitHub tool default settings, a regional cluster will always be created (with multiple replicas of the Control plane in all the zones of the chosen region) but with the worker nodes running in a single region.

For the location variable you can choose either to use a specific zone for your Node Pools (by specifying a zone for the location variable) or to automatically get the first zone of the Region (by specifying a region for the location variable).

This documentation explains the SAS Viya4-iac-gcp tool.

Auto-scaler versus Auto-provisioning

The last item we will discuss in this post is a Google unique feature called "auto-provisioning".

While you probably already know the Node Pool autoscaling concept (you define a minimum and maximum number of nodes in your node pool, so the Cloud provider can add or remove nodes depending on the Pod resource requests increase or decrease), the auto-provisioning is quite different.

As explained in the official Google documentation: “Without node auto-provisioning, GKE considers starting new nodes only from the set of user created node pools. With node auto-provisioning, new node pools can be created and deleted automatically.”

So, with this configuration, instead of adding/removing nodes in an existing Node Pool (within the defined min and max node range), GKE can directly and dynamically create, extend, or remove Node Pools depending on the Pods current workload.

It means that you will also let GKE decide which kind of instance should be provisioned for those dynamic node pools. Therefore, you will be limited in the Node options, for example you cannot use instances with local SSD for ephemeral storage and you can not distinguish which instance will run what.

When you use the auto-provisioning feature the only thing that you can define is a minimum and maximum number of VCPU and memory that can be used by the whole cluster.

So when auto-provisioning is enabled, you might see additional node pools with the “nap” and instance type in the name and associated nodes being created when the pods can’t be scheduled (as in the screen shot below).

In the viya4-iac-gcp tool the node configuration is driven by a variable called “enable_cluster_autoscaling”.

It is disabled by default but when enabled you can set the required the maximum number of core and memory that could be used by the node auto-provisioning (defaults are 500 vcpu and 10TB of RAM).

The minimum are always one vCPU and 1 GB of RAM (which leads to create Node pools with small instances when the auto-provisioning is enabled).

Something to note is that since the IAC tool will always also create pre-defined node pools, we will have both system when we set the enable_cluster_autoscaling to “true”. With both systems in place, it makes it quite difficult to predict how the cluster scaling will behave.

For now, our recommendation for a SAS Viya deployment is to avoid to use the auto-provisioning feature and to leverage only the standard predefined auto-scaling Node Pools instead, as we will have a better control on the SAS Viya application allocations.

Conclusion

As we’ve seen in this post the Google Cloud Platform Infrastructure has several unique characteristics and capabilities.

As a SAS consultant, you are not asked to be a GKE or GCP expert. However, having a basic knowledge of these specifies of the Google platform might be helpful and avoid bad surprises when you are working on an Architecture Design or installation of SAS Viya in GKE.

In the second part of the series we will see, some other specifics of the Google platform that can not only affect the Infrastructure provisioning aspects but also the SAS Viya deployment itself.

Thanks for reading!