Accessing CAS in Kubernetes from JupyterHub/Python

6 Likes

Data scientist and Python developers who are experienced with accessing CAS from Python via JupyterHub are familiar with the steps required to launch a CAS session in a SAS Viya 3.x deployment. Typically, this process simply requires that they specify the hostname and port as well as providing credentials in some manner.

Now that SAS Viya is deployed with Kubernetes and there are new “layers” in the deployment, there are different methods for accessing CAS from Python. If you are architecting the integration of Python or R in JupyterHub with CAS in Kubernetes, there are some considerations that may warrant your attention. In this article, we will look at several variations in which Python can communicate with CAS.

Revisiting SAS Viya 3.x

Before we dig into configuring and accessing CAS in Kubernetes, let’s briefly revisit accessing SAS Viya 3.x CAS via SWAT from Python.

If you recall the process was quite simple from a user perspective. As long as the default port 5570 was accessible from the Python session and the CAS controller hostname resolved properly, the developer or data scientist simply had to reference the CAS controller host and the port. Of course, you had to specify credentials to authenticate, typically via an authinfo file.

If you are an installer, administrator or architect, you had to ensure the CAS controller host and configured port were accessible from the Python session (i.e. no firewalls). In addition, assuming CAS encryption was enabled, it was necessary to acquire the CAS certificate and make it available to all Python sessions that were expecting to connect to CAS. And you had to ensure the SAS SWAT package was installed, along with a few other packages.

In the following diagram, you can see this environment has a dedicated host for open source programming interfaces and single machine deployment of SAS Viya. Each of the programming interfaces establish a CAS session using the SAS Viya hostname and port 5570.

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Accessing CAS in Kubernetes

Now that SAS Viya is deployed in Kubernetes the way CAS services are exposed have changed. One of the key choices is whether JupyterHub will be deployed within the Kubernetes cluster in pods or outside of the cluster.

Access via a Load Balancer

First, let’s take a quick look at access via a load balancer. In the following diagram, we have a Kubernetes cluster with two SAS Viya namespaces. Here we see a load balancer has been defined for binary communication with CAS in the Prod namespace. The CAS client attempts to connect to the advertised load balancer IP address and port (reassigned to 5570) via the external load balancer and load balancer rules direct the traffic to the configured CAS controller. A similar load balancer configuration could also be defined for HTTP traffic using port 8777, but the preferred method of connection is binary using port 5570.

Access via Ingress URL

Yes, you can access CAS through the Ingress URL. Although this method may not be a common one it is possible. Connecting in this manner uses the HTTP protocol and it is not as efficient as connecting via a load balancer, which provides a binary connection. Using HTTP has another drawback and that is the HTTP protocol does not provide data message handlers and requires that the programmer provide extra data formatting.

The following example shows the Python statements required to connect to CAS using the Ingress URL. Note that we needed to provide the path the SSL_CERT_FILE and supply port 443 to ensure encrypted communication. It is possible to define this certificate, so it is available to the launched Jupyter notebook session.

Access from JupyterHub within the K8S Cluster - Option 1

The load balancer option will be a common option for customers who have an existing JupyterHub environment and want to access a new deployment in Azure.

However, another option for accessing CAS is to deploy the Python development environment, in this case, JupyterHub, in a namespace within the same cluster as SAS Viya. This option has a couple of advantages. First, it localizes all network traffic so it can help minimize the cost of network traffic to JupyterHub from SAS Viya. But more importantly, having JupyterHub in the cluster is likely to provide improved performance by collocating JupyterHub and Viya in the same cluster.

In the following diagram, we see JupyterHub in its own namespace and we can use the internal hostname to connect to any CAS instance within the cluster. This eliminates the need to define an alias, as defined in the scenario with a load balancer, to provide a consistent name for the developer. The Jupyter pods can be assigned to a node pool defined specifically for non-Viya processing, or you can allow Kubernetes to assign on which nodes the pods should run.

A brief digression on hostnames

In order to determine the defined hostname for each CAS controller you can use the kubectl command with a specific namespace and extract an environment variable. Of course, if the environment has an additional CAS instance or has changed the name of the default instance then the name of the queried pod will need to match.

Alternatively, we can capture the IP address by extracting it from the pod.

If you then exec into the CAS controller pod and perform a nslookup we see the following. A close look at this output reveals that three entries equate to the client, binary and http services, if defined. The last entry is the same as what we extracted above using the CASCONTROLLERHOST environment variable, but it includes the standard Kubernetes qualifiers of “svc.cluster.local” at the end of the name. Both map to the same IP address and as a result either can be specified in the connection.

You will want to avoid using the hostname that is prefixed by the hyphenated IP address as this will change if the CAS server is bounced.

Here we see that we used the Kubernetes CAS controller hostname, without the svc.cluster.local qualifiers, to connect to the Stable CAS server.

And likewise, in the same Python session, we can connect to the LTS environment within our cluster using the Kubernetes hostname.

So, we see if our JupyterHub is in a dedicated namepace we can use standard hostnames available within the cluster to access CAS. These names do not change unless the configuration is changed to modify the names. One thing to keep in mind if going this route is that the Viya namespaces are not secured from within the cluster by default (e.g. Network Policies).

Access from JupyterHub within the K8S Cluster – Option 2

This option is a variation on the previous configuration. Rather than deploy JupyterHub in its own namespace, you could also deploy it within the SAS Viya namespace. Here we see that the user accesses JupyterHub via nginx and then connects directly to the CAS controller from within the SAS Viya namespace. Although only one SAS Viya namespace is shown here, it is possible to have a second SAS Viya namespace in the cluster and a second JupyterHub instance alongside the SAS Viya deployment. Of course, this adds complexity and demand on resources, but if a customer wanted to isolate JupyterHub development to a specific SAS Viya deployment this could be an option.

With this option, it is not necessary to qualify the namespace when connecting to CAS. You can simply use the standard client ClusterIP service to access CAS since JupyterHub resides within the namespace.

For example, if you create an SMP instance and change the name from “default” to “smp”, you would simply connect to to sas-cas-server-smp-client, instead of sas-cas-server-default-client shown below.

Choosing this option would have the same benefits as JupyterHub in its own namespace but would have an added small bonus of being able to directly access the necessary CAS CA certificate.

Access via NodePort

Before we delve into this method it is important to note that this option is not available in AKS, as only the load balancer is available for binary access from Python external to CAS in the cluster.

This method defines a NodePort by configuring the same file as used to configure a LoadBalancer, cas-enable-external-services.yaml. This capability allows the user to specify any of the hostnames in the cluster and the node port assigned during configuration. Here the JupyterHub session defines the hostname for "Node 6" and port 24611 to gain access to the CAS controller. Kubernetes then routes this request via the NodePort rules to the CAS controller.

In order to determine the NodePort the following command can be used.

From the service output above we can see that the assigned NodePort is 24611 for the sas-cas-server-default-bin service. If CAS is restarted it is very likely that the assigned NodePort would change. Using this port for a quick connection check is sufficient, but if this were a long-term deployment it would be advised to set the port to a fixed value. Remember, port 5570 is the port assigned to the CAS controller and 24611 is assigned by Kubernetes when the NodePort is created.

Here we see that when specifying one of the hostnames in the cluster and the Nodeport, 24611, we can successfully connect to CAS.

If you want to change the assigned port, you can patch the service to set the desired port. The following command will set the NodePort so that it matches the target port. This can also be accomplished via example patch transformer included in sas-bases of the deployment assets.

kubectl patch service sas-cas-server-default-bin -n ${NS} --type='json' --patch='[{"op": "replace", "path": "/spec/ports/0/nodePort", "value":5570}]'

This concludes the variety of ways available to connect to CAS in Kubernetes, so let’s wrap up.

Final Thoughts

In this article we’ve shown there are multiple ways to access CAS resources from JupyterHub, depending on where the instance of JupyterHub is deployed and how the user intends to connect. In all examples except one, the access was via a binary connection. However, an HTTP connection could also be considered as well. Typically, a programmer will want to want to employ the binary connection as it is more efficient than an HTTP connection.

If the JupyterHub deployment resides within the same cluster as the SAS Viya deployment, then there is no requirement to configure additional services to provide access to CAS resources. For customers with new deployments, the addition of JupyterHub to the environment may be a consideration when planning cloud resources.

Customers that want to leverage an existing JupyterHub deployment with a new SAS Viya deployment in Azure will need to define a LoadBalancer service and ensure firewalls are open for binary access. NodePorts can be used to access CAS resources when deployed outside of Azure (currently used by SAS internally).

Thanks for reading.

Find more articles from SAS Global Enablement and Learning here.

iLight · ‎04-12-2022

Thanks for this interesting article, the command works fine for us!

However, we want to accomplish the example patch transformer included in sas-bases of the deployment assets, but we can not find it. (And tried several options with each their own error message.) Is there an example available to apply this suggestion within our main kustomazation.yaml so we can automatically build this in our first site.yaml?