BookmarkSubscribeRSS Feed

Exploring SAS Viya on Google Kubernetes Engine (GKE) – Default CASLIBs

Started ‎06-25-2021 by
Modified ‎06-25-2021 by
Views 4,412

SAS Viya (Stable 2020.1.4 or later, LTS 2021.1 or later) can now be deployed on Google Kubernetes Engine or GKE. SAS Viya on Kubernetes brought many challenges regarding storage.

 

How does it translate in the Google Cloud Platform world? Where do you put your data files in order to access them from SAS Viya? What if you have existing data files on a NFS server that you want to access? In this first part, I will explore how block storage is being provisioned for the default CAS libraries, how to know where it is stored physically, how to access it if necessary, etc.

 

In this architecture diagram, extracted from the documentation, we have an overview of the Google Cloud Services involved in a SAS Viya architecture.

 

nir_post_64_01_images_arch-gke.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 

What will be of interest to us is the part about the network storage:

 

 

nir_post_64_02_nfs_filestore.png

 

So, what are the different pieces of (block) storage involved in a SAS Viya deployment on GKE?

 

  • Persistent Disks (GCP Service for managed disks) used as Google Compute Engines disks (the virtual machines used in the GKE cluster)
  • Persistent Disks for handling RWO (ReadWriteOnce access mode) Persistent Volume Claims
  • NFS or Google Filestore (the Network Attached Storage – or NAS – GCP Service) for handling RWX (ReadWriteMany access mode) Persistent Volume Claims

 

Basically, the difference between RWO and RWX is the ability of a physical disk location to be mounted by multiple hosts (RWX) or not (RWO).

 

CAS is typically a component that will take advantage of RWX access mode, especially when multiple CAS workers are setup. By the way, several RWX Persistent Volume Claims are configured by default and used by the CAS pods, like cas-default-data, cas-default-permstore, sas-quality-knowledge-base, etc.  

 

Back to the diagram, what are the 2 options mentioned?

 

To handle RWX in GKE, we need a NFS-based storage. Two options are available in the SAS Viya Infrastructure as Code GitHub project:

 

  • storage_type=standard: a NFS server setup on a Google Cloud Virtual Machine (Google Compute Engine)
  • storage_type=ha: a Google Filestore instance

You could also use specialized solutions like Cloud Volumes Service based on NetApp.

 

In our example, we’ll take Google Filestore, a very easy solution to setup.  

 

At the end of a SAS Viya deployment on GKE, we have:

 

  • A “CLIENT” where we ran the SAS Viya IaC Terraform scripts to provision the GKE cluster and from where we deployed SAS Viya using kustomize and kubectl
  • 6 GKE node pools, including one for CAS
  • A Google Filestore instance that has been used to provision RWX Persistent Volume Claims dynamically using an NFS StorageClass
  • A “JUMP SERVER” that can be used to interact with the Filestore instance through NFS

 

NB: My environment has been provisioned with the Viya IaC tools and deployed manually using kustomize and kubectl. The NFS StorageClass has only been used for selected pods.

 

This is depicted in the following diagram:

 

 

nir_post_64_03_architecture.png

 

The Google Filestore instance has an IP address and has a root directory named /volumes.

 

The Jump Server instance has the Filestore /volumes directory mounted in /viya-share.

 

Now, let’s explore the environment and do the following simple exercise: where is located the folder used by the Public CASLIB?

 

 

1 – Get the path of the Public CASLIB

 

You can go in SAS Viya “Manage Data” to get that information:

 

 

nir_post_64_04_public_contents.png

 

 

The path is /cas/data/caslibs/public.  

 

 

2 – Get the volume mounts of the cas container in the CAS Controller pod

 

You can run this command on the kubectl “CLIENT”:

 

kubectl -n gelgcp get pod sas-cas-server-default-controller \
-o json | jq '.spec.containers[] | select(.name=="cas") | .volumeMounts[]'

 

Here the command outputs some details about the CAS Controller pod. The jq utility helps filtering the “cas” container and selecting the volume mounts at the container level.

 

Result:

 

{
  "mountPath": "/cas/permstore",
  "name": "cas-default-permstore-volume"
}
{
  "mountPath": "/cas/data",
  "name": "cas-default-data-volume"
}
{
  "mountPath": "/cas/cache",
  "name": "cas-default-cache-volume"
}
{
  "mountPath": "/cas/config",
  "name": "cas-default-config-volume"
}
{
  "mountPath": "/tmp",
  "name": "cas-tmp-volume",
  "subPath": "tmp"
}
{
  "mountPath": "/cas/license",
  "name": "cas-license-volume"
}
...

 

The volume mount name corresponding to /cas/data (the root path of the Public CASLIB) is cas-default-data-volume.  

 

 

3 – Get the volumes of the CAS Controller pod

 

You can run this command on the kubectl “CLIENT”:

 

kubectl -n gelgcp get pod sas-cas-server-default-controller -o json | jq '.spec.volumes[]'

 

Here the command outputs some details about the CAS Controller pod. The jq utility helps selecting the volume specifications at the pod level.  

 

Result:

 

{
  "name": "cas-default-permstore-volume",
  "persistentVolumeClaim": {
    "claimName": "cas-default-permstore"
  }
}
{
  "name": "cas-default-data-volume",
  "persistentVolumeClaim": {
    "claimName": "cas-default-data"
  }
}
{
  "emptyDir": {},
  "name": "cas-default-cache-volume"
}
{
  "emptyDir": {},
  "name": "cas-default-config-volume"
}
{
  "emptyDir": {},
  "name": "cas-tmp-volume"
}
...

 

The claim name corresponding to the cas-default-data-volume volume mount name is cas-default-data.

 

 

4 – Get information about the claim

 

You can run this command on the kubectl “CLIENT”:

 

kubectl -n gelgcp get pvc cas-default-data

 

Here the command outputs some details about the claim name, including the name of the associated volume.  

 

Result:

 

NAME               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
cas-default-data   Bound    pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e   8Gi        RWX            sas-gke        4h31m

 

The volume name corresponding to the cas-default-data claim name is pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e.  

 

 

5 – Get information about the persistent volume

 

You can run this command on the kubectl “CLIENT”:

 

kubectl get pv pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e -o json | jq '.spec'

 

Here the command outputs some details about the persistent volume. The jq utility helps selecting the specifications.  

 

Result:

{
  "accessModes": [
    "ReadWriteMany"
  ],
  "capacity": {
    "storage": "8Gi"
  },
  "claimRef": {
    "apiVersion": "v1",
    "kind": "PersistentVolumeClaim",
    "name": "cas-default-data",
    "namespace": "gelgcp",
    "resourceVersion": "11564",
    "uid": "ec3c21b5-82b8-4cf5-8256-81115a3d197e"
  },
  "mountOptions": [
    "noatime",
    "nodiratime",
    "rsize=262144",
    "wsize=262144"
  ],
  "nfs": {
    "path": "/volumes/gelgcp-cas-default-data-pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e",
    "server": "10.X.Y.106"
  },
  "persistentVolumeReclaimPolicy": "Delete",
  "storageClassName": "sas-gke",
  "volumeMode": "Filesystem"
}

 

Finally, the physical path behind the /cas/data observed path in SAS Viya is /volumes/gelgcp-cas-default-data-pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e on a server whose IP address is 10.X.Y.106 (this is the Google Filestore instance).  

 

 

Going further

 

What if you want to access this location, upload some data in it, or download some results from it?

 

You can use the Jump Server for that. Remember that it has the Filestore /volumes directory automatically mounted in /viya-share. The Public CASLIB contents is then accessible on the Jump Server as /viya-share/gelgcp-cas-default-data-pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e/caslibs/public.

 

nir_post_64_05_jump_server.gif

 

There are also other ways to interact with that physical location and manage files in it like using the kubectl cp command or some Kubernetes jobs.  

 

 

Conclusion

 

In this article, we briefly covered how default CASLIBs’ block storage was provisioned in Google Cloud Platform and illustrated how to find the right information about the different storage layers involved in Kubernetes.

 

While it is focused on Google Cloud Platform, the same principles apply for other Cloud providers.

 

Thanks for reading.

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎06-25-2021 10:15 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started