SAS Viya Troubleshooting pro-tip : log into your Viya nodes

5 Likes

While the majority of the SAS Administrator’s interactions with the SAS Viya infrastructure happens at the Kubernetes level (with tools like kubectl or Lens) or with SAS proprietary tools (such as the SAS Environment Manager or the sas-viya CLI), there could be some situations where an access to the underlying node host is required : troubleshooting, maintenance, system logs collection, etc…

As an example, SAS tech support and field consultants already encountered a few scenarios where getting on the k8s nodes and querying journalctl/kubelet/kernel message logs were necessary (for example to determine if CAS pods are being killed by the OOM killer).

"How do I login to my Viya nodes ?" has become a frequently asked question, the purpose of this blog is to provide some guidance about the best way to do it.

Disclaimer

In the follow up of this blog we will discuss various methods to log into the Viya nodes. But first of all, let’s make things clear :

SAS does not officially support the solutions presented below in any form. Logging onto your cluster nodes is strongly discouraged. Development and debugging of ones pod should be done via logging and other means, i.e. APIs, etc.

Different ways to login…

The most obvious way that comes to mind is to connect through SSH to the cluster underlying Linux nodes.

If your Kubernetes cluster is running in the Cloud, another potential option could be to leverage the cloud CLI, for example the Azure CLI has a the az vm run-command invoke commands to submit individual commands or run a script.

However these techniques are generally not very efficient: specific network rules or policies could prevent this type of SSH access from the outside, and the Cloud CLI option is cumbersome, not always possible and is different depending on the cloud provider.

Cloud providers (Azure, AWS, GCP, …) are giving you a "Kubernetes Managed Service" (AKS, EKS, GKE) and usually don’t like you to directly log into the underlying VMs by-passing the natural way to interact with the service. That’s why there are often barriers preventing you to directly get access to the underlying Cloud VM.

In Azure, if you are using the IaC tool (viya4-iac-azure) to provision your AKS infrastructure and have opted for the creation of a jump host VM, then you can use the private ssh key (required for the IaC build) to connect via SSH to the Jump Host VM in the Azure and from there, access the AKS worker nodes (the corresponding public key is distributed and “injected” in the AKS nodes as part of the IaC execution).

But it remains a pretty cumbersome, with a “two hops” process and this solution is not necessarily working for other Kubernetes platforms (not working in AWS EK for example).

But better to use the “container based” way…

However, there is a method that should work for any type of Kubernetes platform and does not require any specific host account password or SSH private key.

This generic solution is what we call the "container based" way because the technique used is to start a new pod with a container and exec into it to get access to the underlying host.

Requirements

We will show and explain two methods : with an open source utility called node-shell or directly using the kubectl command, but whatever the method you choose, the requirements are the same :

The KUBECONFIG file being used to access the cluster must have admin rights to the cluster.
- A cluster-wide admin level kube config file is required. A namespaced kube config will not work with the information provided. Only this cluster-wide admin level provides the 'root' level access on the nodes.
You must also have access from the machine you're running kubectl from a network perspective either VPN, direct, or other means.

Method 1 : node-shell plugin

The first method one relies on a small utility called node-shell that lets you start a root shell in the node's host OS running. If you have been using the Lens application (great UI administration interface for Kubernetes) you are actually already using it without knowing it 😊. Indeed Lens has a feature in the Nodes view to let you open an SSH connection to the Kubernetes nodes. Behind the scene the node-shell utility is used.

The node-shell utility can be pulled and installed either in stand-alone mode, directly with 3 commands or it can be installed through krew.

Krew is a plugin manager for kubectl, it allows you to install various plugin to increase even more the capabilities of the kubectl command.

After having installed krew by following the instructions from there it is very easy to install the node-shell plugin (as well as other nice plugins) :

kubectl krew install node-shell

From here, we simply list our nodes:

kubectl get nodes

For example, in azure we would see something like:

NAME                                STATUS   ROLES   AGE     VERSION
aks-cas-17945816-vmss000000         Ready    agent   4m54s   v1.23.12
aks-compute-27817762-vmss000000     Ready    agent   4m30s   v1.23.12
aks-stateful-16007966-vmss000000    Ready    agent   4m55s   v1.23.12
aks-stateless-37479317-vmss000000   Ready    agent   4m58s   v1.23.12
aks-system-48059844-vmss000000      Ready    agent   11m     v1.23.12

Then from that output, we simply target a node. For this example we'll say the node we want to connect to is aks-cas-17945816-vmss000000:

kubectl node-shell aks-cas-17945816-vmss000000

At this point you should see a prompt from that node, just like if you had effectively "SSHed" into the node.

spawning "nsenter-2njj8b" on "aks-cas-17945816-vmss000000"
If you don't see a command prompt, try pressing enter.
root@aks-cas-17945816-vmss000000:/# id
uid=0(root) gid=0(root) groups=0(root),1(daemon),2(bin),3(sys),4(adm),6(disk),10(uucp),11,20(dialout),26(tape),27(sudo)
root@aks-cas-17945816-vmss000000:/# ls
NOTICE.txt  bin  boot  dev  etc  home  initrd.img  initrd.img.old  lib  lib64  lost+found  media  mnt  opt  proc  root  run  sbin  srv  sys  tmp  usr  var  vmlinuz  vmlinuz.old
root@aks-cas-17945816-vmss000000:/#

Method 2 : kubectl debug

If you don’t want (or are not allowed) to install anything on your client/jump host machine in addition to kubectl, you can use a kubectl native command called kubectl debug.

This feature is provided out of the box and appears in the Kubernetes official documentation. Here is the syntax :

kubectl debug node/<node name> -it --image=<containerized OS>

All you need to provide is the name of the node that you want to connect to and the image of the container used to debug (the image must at least have the bash command).

For example in Azure , it would be something like :

[cloud-user@pdcesx03094 ~]$ kubectl get nodes
NAME                                STATUS   ROLES   AGE     VERSION
aks-cas-17945816-vmss000000         Ready    agent   3m37s   v1.23.12
aks-compute-27817762-vmss000000     Ready    agent   3m13s   v1.23.12
aks-stateful-16007966-vmss000000    Ready    agent   3m38s   v1.23.12
aks-stateless-37479317-vmss000000   Ready    agent   3m41s   v1.23.12
aks-system-48059844-vmss000000      Ready    agent   10m     v1.23.12
[cloud-user@pdcesx03094 ~]$ kubectl debug node/aks-cas-17945816-vmss000000 -it --image=busybox
Creating debugging pod node-debugger-aks-cas-17945816-vmss000000-m9xm8 with container debugger on node aks-cas-17945816-vmss000000.
If you don't see a command prompt, try pressing enter.
/#

Note that in this example, we’ve been using busybox which is a minimal linux system which does not necessarily include the system debugging tools that you might need to troubleshoot an infrastructure level issue.

Instead, you could use a more complete image that would include additional system debugging tools, such as ubuntu for example, but it would pull a larger image and run a heavier container in the cluster.

This way to get SSH access to the host is the simplest and the most generic, I was able to test it with success for every supported Kubernetes platform.

Example in GCP:

Example in AWS:

Example in opensource K8s:

Example in RHEL OpenShift:

Finally, please note that there are a few important things to know about the kubectl debug command though, here is an extract from the Kubernetes documentation :

Conclusion

As we’ve seen, the two techniques (node-shell and kubectl debug) work, no matter what the cloud provider is, how the network has been secured or if host accounts public keys have been "cloud-init" loaded or not.

However, while required in some situation, such type of direct access should remain an exception and should be done very carefully only by the SAS or Kubernetes administrator (not every developer !) for ad hoc debugging or troubleshooting.

As rightfully noted by the SAS R&D:

“This kind SSH access could be abused to do additional node setup, but it is a bad practice since nodes are ephemeral and can come and go with autoscaling etc. Node modifications should be performed via DaemonSets or similar.

While abuse cannot be ruled out, there are still legitimate use cases for access nodes directly, to explore node configurations, to debug issues (e.g., to investigate why the Daemonset does not work as intended) etc.”

That's all, thanks for reading!

andy_azs · ‎07-18-2023

Hi Raphael,

Great article, thanks. The last image is a link to an internal URL on sww, can you make it available externally?

Thanks,

Andy

AllenCunningham · ‎07-18-2023

The image should now be available. Thanks for letting us know!

AllenCunningham · ‎05-30-2024

Test a comment