We've been doing our bit in preparing for SAS Viya 2020.1 and beyond. There's been a lot to learn about Kubernetes in general and specifically for configuring Viya 4 and the abstraction layers it relies on to achieve the goals as desired. One item that ran us down a rabbit hole is making changes to the /etc/hosts file to put in custom hostname aliases.
If you're already a Kubernetes expert administrator, then this post won't help you. However if, like me, you're putting on your first pair of k8s boots and stumbling out into the real world, then hopefully you'll gain a bit more insight into how SAS Viya works within Kubernetes.
Alternate titles for this post:
Credit to my colleague Erwan Granger for that last one. 😜
I suspect most of you already know why, but let's make it clear. Your host OS is probably configured to refer to the /etc/hosts file first when attempting to resolve a hostname to an IP address. If nothing matches in /etc/hosts, then the network DNS is tried. That's not a universally guaranteed setup, but it's typical. This approach provides flexibility to override and/or supplement the network DNS for hostname resolution on a host-by-host basis.
By default, the /etc/hosts file usually just provides resolution for "localhost" to point to IP address 127.0.0.1 (and IPv6 ::1). But we can add more lines to the file, each specifying an IP address along with one or more hostname aliases (referred to from here on as "host aliases").
We want to add lines like this when hostnames we want to refer to aren't known by the DNS. This can happen for many possible reasons, including:
The challenge with using /etc/hosts is that you are responsible for keeping it maintained and up-to-date on every host where you rely on it. That can be a large and tedious task if we aren't careful. That's why DNS exists as a central, single-point-of-contact for network resolution.
Of course, the /etc/hosts file isn't meant to be a panacea to resolve your network routing issues. But sometimes it is the exact right place to put what you need.
Normally on a machine where you have sufficient admin privileges in the OS, you'd simply edit the /etc/hosts file to add new lines for your desired host aliases.
In Kubernetes, we refer to the host machines as "nodes". So for a Viya 4 deployment, you might think you should just logon to the nodes' OS directly, modify /etc/hosts, make a quick validation that you can ping the host alias as desired, slap your hands together signifying a job well done, then head off to the pub (hey, it's 5 o'clock somewhere).
The problem with this approach is that while the nodes will see this change, the Kubernetes pods and their constituent containers won't. The containers have their own /etc/hosts file. You need to make the change there.
Now how many pods is Viya 4 currently running? It could be a number ranging from 120 to over 350 depending on the product mix. Remember what I said about how tedious it'd be to manually maintain /etc/hosts across a lot of machines? Yeah. Boom.
We must think like large-scale enterprise hosting admins. When you're running Kubernetes, that's exactly what you are, even for just a single deployment of Viya 4.
The various services for Viya 4 are divvied up to run in containers. There are so many containers that we rely on Kubernetes to provide a framework for their administration and operation. That means that the containers for Viya are placed into Kubernetes pods. And then Kubernetes deploys those pods to the various nodes (host machines) which are provisioned for the Viya workloads.
Having to declare the unique manifests for all individual pods is too fine-grained for most use cases. Many pods have a shared set of configuration options (like ReplicaSets) So there's a concept of deployment "kind" used to define pods with shared characteristics. Some default kinds already fit well with Viya services. And for others, we've defined our own kinds.
So now we need a way to direct Kubernetes to modify the /etc/hosts file for these various kinds of pod deployments.
Every pod is defined with a manifest, which is a plain-text file that describes the pod's attributes (which container to run, storage to attach, network interface to bind, etc.). And we can use that to define new lines for /etc/hosts if we want to. But again, with hundreds of pods in a Viya 4 deployment, this isn't where we want to be.
Enter the tool known as Kustomize which is the native configuration tool for Kubernetes. The general idea is that you can use (and re-use) standard templates for your pods as a starting point and then extend them further with overlay templates. From a SAS perspective, this allows us to ship a standard set of software manifests for Viya to customers which we can configure and extend using site-specific overlay templates.
Kustomize relies on YAML to describe things… and so let's look closer at what we want to do.
Let's take a look at the YAML syntax needed to add new lines to the /etc/hosts file of select pods in our Viya deployment.
First of all, we'll create a new plain-text file and place it in a sub-directory named for easy reference on your Kubernetes control host. Something like: /path/to/project/<Viya Namespace>/site-config/network/etc-hosts_addendum.yaml.
Specifically, we define a Kustomize patch transformer which is used to replace and/or remove content from the original manifest definitions. Here we go defining a hostAlias patch:
This patch transformer will insert (or replace) the hostAlias definition in the site.yaml file for pods that match "target: kind: CASDeployment". That's right - we're using YAML in this file to create more YAML in a different file. When followed through to completion, the desired result will be that each container in those pods will have the following lines added to their /etc/hosts files:
10.96.1.1 my-a-host 10.96.2.2 my-b-host.customer.com my-b-host
Depending on the spec of the target, there may be some error checking (like IP addresses must be all numeric) or none at all. Ultimately, it's on you to ensure that those IP addresses and host aliases are correct and intended for the target pods.
Other directives of note in the YAML above:
Error: accumulateFile "accumulating resources from 'site-config/network/etc-hosts_addendum.yaml': may not add resource with an already registered id: ~G_builtin_PatchTransformer|~X|etc-hosts-deployment", loader. New "Error loading site-config/network/etc-hosts_addendum.yaml with git: url lacks host: site-config/network/etc-hosts_addendum.yaml, dir: got file 'etc-hosts_addendum.yaml', but '/usr/csuser/clouddrive/project/deploy/lab/site-config/network/etc-hosts_addendum.yaml' must be a directory to be a root, get: invalid source string: site-config/network/etc-hosts_addendum.yaml"
We can add more patch transformer definitions to this same file to drive similar changes to /etc/hosts files in other Viya pods. To make this same change on all Viya 4 pods, we need apply several patch transformers. Create a file on your Kubernetes controller under site-config for your Viya deployment, something like: /path/to/<Viya Namespace>/site-config/network/etc-hosts_addendum.yaml.
Of course, if you only need the new host aliases for certain kinds of pods - such as defining a remote DBMS for CAS to work with using SAS/ACCESS data connectors - then only provide patch transformers that are really needed to do the job. Keeping these changes small in scope helps reduce the chance of unexpected implications later.
So we've created this new YAML file, but that's not enough. We need to tell Kustomize where to find it when building the site.yaml file for Viya 4 deployment.
We do that by editing the base kustomization.yaml file for our project in /path/to/<Viya Namespace>. Look for the "transformers:" section and add a reference to our etc-hosts_addendum.yaml file at the end:
Now we're ready for Kustomize to take this change we want and build the site.yaml file for the Viya deployment. So from the same directory as the base kustomization.yaml file, go ahead and build the updated site.yaml:
If it runs successfully, there won't be any output, but you will see that the timestamp of the resulting site.yaml file will be updated as its content has changed.
At this point, we can peek inside the site.yaml file. In my small Viya 4 deployment, I have 126 host alias definitions for the various Viya pods. Yours might have many more. For pods where no new host aliases are applied, you'll see it defined as an empty sequence: "hostAliases: ".
If everything checks out and looks right so far, then we're ready to make these changes take effect. This is done by directing Kubernetes to apply the site.yaml file to the cluster:
Assuming all is well, this command will generate several status lines of text as it processes site.yaml and finish. But it is possible to have created a site.yaml file which is syntactically correct in terms of the YAML layout, but which kubectl might still have a problem with.
The problems kubectl might complain about at this point should refer to not understanding the intent of the new YAML we added… where the YAML is parseable, but the result that Kubernetes is trying to apply to hostAliases might not make sense. Think of it as someone calling you by your name in reverse order. The pieces are all there, but you don't get the desired appellation.
For "normal" pods - which comprise most of a Viya deployment - successfully applying the site.yaml is all you need to do to put the new host aliases into effect. But some pods aren't "normal", requiring operators to refer to pod templates to direct the instantiation of pods with the change. CAS in particular is one of these.
To get CAS to pick up the host aliases, we need to effectively stop and restart it.
Because Kubernetes is configured to maintain a specific number of pods running for CAS, then deleting the current pods causes Kubernetes to start up new ones to replace them. It’s the classic Ship of Theseus approach to software management philosophy first postulated by Plato and Heraclitus over 2,000 years ago.
You're probably wondering if these changes really worked. Or you should be. With this many levels of redirect and abstraction, anything is possible. We created a new YAML file which defined override parameters for reference when Kustomize assembled the final site.yaml file. Then site.yaml was applied to our cluster by kubectl, however we had to then direct Kubernetes to restart the CAS-specific containers because they're managed by an operator and aren't immediately affected by applying site.yaml like other Viya pods.
Still with me? We're in the home stretch. Just hold on a little longer.
The test is pretty simple: run a command inside a container to check the /etc/hosts file for the new aliases. Something like:
and for pods with multiple containers:
If you find that CAS doesn't show the new host aliases in /etc/hosts, remember to direct Kubernetes to restart CAS by deleting its pods.
Now those two commands are good for spot-checking, but for a fully comprehensive validation, you'll need to get the full list of pods for your deployment and then loop through them all to test.
After applying the changes from site.yaml to the cluster, it might take several minutes before all pods will respond with a "PASS" when running this validation script. Re-run it to see if the number of PASSes increases with each run.
I hope you found this exercise interesting. For me, it helped to clarify the various layers of a SAS Viya configuration and which utilities are responsible to get it done. We've shared this topic with the Viya dev teams for their input... and they acknowledge they might be able to provide a simpler approach in the future. In the meantime, when you don't have the ability to make a change to your site's DNS, then use the approach described here to make the desired changes to the /etc/hosts files when needed.
Special thanks to my GEL colleague Erwan Granger, Advisory Technical Architect, as well as David Page, Distinguished Software Developer in DevOps Engineering R&D, for their collaboration and review of this post. Any mistakes are mine, not theirs.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.