SAS Viya (Stable 2020.1.4 or later, LTS 2021.1 or later) can now be deployed on Google Kubernetes Engine or GKE. SAS Viya on Kubernetes brought many challenges regarding storage.
How does it translate in the Google Cloud Platform world? Where do you put your data files in order to access them from SAS Viya? What if you have existing data files on a NFS server that you want to access? In this first part, I will explore how block storage is being provisioned for the default CAS libraries, how to know where it is stored physically, how to access it if necessary, etc.
In this architecture diagram, extracted from the documentation, we have an overview of the Google Cloud Services involved in a SAS Viya architecture.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
What will be of interest to us is the part about the network storage:
So, what are the different pieces of (block) storage involved in a SAS Viya deployment on GKE?
Basically, the difference between RWO and RWX is the ability of a physical disk location to be mounted by multiple hosts (RWX) or not (RWO).
CAS is typically a component that will take advantage of RWX access mode, especially when multiple CAS workers are setup. By the way, several RWX Persistent Volume Claims are configured by default and used by the CAS pods, like cas-default-data, cas-default-permstore, sas-quality-knowledge-base, etc.
Back to the diagram, what are the 2 options mentioned?
To handle RWX in GKE, we need a NFS-based storage. Two options are available in the SAS Viya Infrastructure as Code GitHub project:
You could also use specialized solutions like Cloud Volumes Service based on NetApp.
In our example, we’ll take Google Filestore, a very easy solution to setup.
At the end of a SAS Viya deployment on GKE, we have:
NB: My environment has been provisioned with the Viya IaC tools and deployed manually using kustomize and kubectl. The NFS StorageClass has only been used for selected pods.
This is depicted in the following diagram:
The Google Filestore instance has an IP address and has a root directory named /volumes.
The Jump Server instance has the Filestore /volumes directory mounted in /viya-share.
Now, let’s explore the environment and do the following simple exercise: where is located the folder used by the Public CASLIB?
1 – Get the path of the Public CASLIB
You can go in SAS Viya “Manage Data” to get that information:
The path is /cas/data/caslibs/public.
2 – Get the volume mounts of the cas container in the CAS Controller pod
You can run this command on the kubectl “CLIENT”:
kubectl -n gelgcp get pod sas-cas-server-default-controller \
-o json | jq '.spec.containers[] | select(.name=="cas") | .volumeMounts[]'
Here the command outputs some details about the CAS Controller pod. The jq utility helps filtering the “cas” container and selecting the volume mounts at the container level.
Result:
{
"mountPath": "/cas/permstore",
"name": "cas-default-permstore-volume"
}
{
"mountPath": "/cas/data",
"name": "cas-default-data-volume"
}
{
"mountPath": "/cas/cache",
"name": "cas-default-cache-volume"
}
{
"mountPath": "/cas/config",
"name": "cas-default-config-volume"
}
{
"mountPath": "/tmp",
"name": "cas-tmp-volume",
"subPath": "tmp"
}
{
"mountPath": "/cas/license",
"name": "cas-license-volume"
}
...
The volume mount name corresponding to /cas/data (the root path of the Public CASLIB) is cas-default-data-volume.
3 – Get the volumes of the CAS Controller pod
You can run this command on the kubectl “CLIENT”:
kubectl -n gelgcp get pod sas-cas-server-default-controller -o json | jq '.spec.volumes[]'
Here the command outputs some details about the CAS Controller pod. The jq utility helps selecting the volume specifications at the pod level.
Result:
{
"name": "cas-default-permstore-volume",
"persistentVolumeClaim": {
"claimName": "cas-default-permstore"
}
}
{
"name": "cas-default-data-volume",
"persistentVolumeClaim": {
"claimName": "cas-default-data"
}
}
{
"emptyDir": {},
"name": "cas-default-cache-volume"
}
{
"emptyDir": {},
"name": "cas-default-config-volume"
}
{
"emptyDir": {},
"name": "cas-tmp-volume"
}
...
The claim name corresponding to the cas-default-data-volume volume mount name is cas-default-data.
4 – Get information about the claim
You can run this command on the kubectl “CLIENT”:
kubectl -n gelgcp get pvc cas-default-data
Here the command outputs some details about the claim name, including the name of the associated volume.
Result:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
cas-default-data Bound pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e 8Gi RWX sas-gke 4h31m
The volume name corresponding to the cas-default-data claim name is pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e.
5 – Get information about the persistent volume
You can run this command on the kubectl “CLIENT”:
kubectl get pv pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e -o json | jq '.spec'
Here the command outputs some details about the persistent volume. The jq utility helps selecting the specifications.
Result:
{
"accessModes": [
"ReadWriteMany"
],
"capacity": {
"storage": "8Gi"
},
"claimRef": {
"apiVersion": "v1",
"kind": "PersistentVolumeClaim",
"name": "cas-default-data",
"namespace": "gelgcp",
"resourceVersion": "11564",
"uid": "ec3c21b5-82b8-4cf5-8256-81115a3d197e"
},
"mountOptions": [
"noatime",
"nodiratime",
"rsize=262144",
"wsize=262144"
],
"nfs": {
"path": "/volumes/gelgcp-cas-default-data-pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e",
"server": "10.X.Y.106"
},
"persistentVolumeReclaimPolicy": "Delete",
"storageClassName": "sas-gke",
"volumeMode": "Filesystem"
}
Finally, the physical path behind the /cas/data observed path in SAS Viya is /volumes/gelgcp-cas-default-data-pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e on a server whose IP address is 10.X.Y.106 (this is the Google Filestore instance).
Going further
What if you want to access this location, upload some data in it, or download some results from it?
You can use the Jump Server for that. Remember that it has the Filestore /volumes directory automatically mounted in /viya-share. The Public CASLIB contents is then accessible on the Jump Server as /viya-share/gelgcp-cas-default-data-pvc-ec3c21b5-82b8-4cf5-8256-81115a3d197e/caslibs/public.
There are also other ways to interact with that physical location and manage files in it like using the kubectl cp command or some Kubernetes jobs.
Conclusion
In this article, we briefly covered how default CASLIBs’ block storage was provisioned in Google Cloud Platform and illustrated how to find the right information about the different storage layers involved in Kubernetes.
While it is focused on Google Cloud Platform, the same principles apply for other Cloud providers.
Thanks for reading.
Find more articles from SAS Global Enablement and Learning here.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.