SAS Viya Temporary Storage on Red Hat OpenShift – Part 1

1 Like

SAS compute sessions and CAS servers require temporary storage for multiple purposes. Choosing a suitable location for SAS Viya temporary storage can be a key tuning step for optimal production deployments. In today’s cloud environments, this means addressing multiple layers of abstraction, from the underlying infrastructure, to the Kubernetes cluster, up to the applications running inside it.

In this series of articles, we explore how to provision local temporary storage to a Red Hat OpenShift Container Platform cluster deployed on Azure, and how to leverage it in SAS Viya.

Introduction

Storage design can differ considerably between infrastructure providers and from cluster to cluster. To successfully deliver a working environment, by default, SAS Viya compute servers are configured to use a minimum common denominator for temporary storage that is guaranteed to be always available. Both SAS compute sessions and CAS servers leverage a directory provisioned as a Kubernetes emptyDir, which uses disk space from the root volume of the Kubernetes node where they are running.

This default configuration is acceptable for testing and evaluation, but not for production workloads. If disk space in the root volume of the node becomes low, then Kubernetes begins evicting pods, and, in extreme circumstances, the node itself can crash.

SAS recommends using fast storage for temporary locations. Often, the best performance is provided by using disks that are locally attached to the node, such as NVMe or SSD disks. Since this is temporary storage, the disks that are used do not need to persist beyond the duration of each pod, nor follow the pod if it is moved to a different node. For this reason, ephemeral storage is ideal.

Most cloud providers offer virtual machines that include a temporary disk for ephemeral storage. In this article, we will walk through the steps we used to provision Azure ephemeral disks as a temporary location for the pods in our OpenShift Container Platform cluster.

A note of caution

These articles are the result of testing within the environments we use for the “SAS Viya 4 - Deployment on Red Hat OpenShift Container Platform” workshop. For our convenience, we deploy on-demand OpenShift on Azure clusters; although that works fine for our objectives, this infrastructure is not included in the official SAS Viya system requirements. So, why discussing this in a public post? Because it can still provide value for multiple situations:

although SAS Viya on OpenShift is currently fully supported only on VMware vSphere, it is possible to deploy it on different infrastructure providers, such as Azure, under the “SAS Support for Alternative Kubernetes Distributions” policy.
the high-level process and steps are similar across different infrastructures. If your server has an available fast disk, you can use it as described here, independently from how the disk itself was provisioned. Some details may need to be adjusted from the example commands shown here, but it should all be manageable by a Kubernetes admin.
finally, maybe you end up reading this post and decide it can be useful to you for a different use case on a similar infrastructure.

In summary, we are not stating that your SAS Viya cluster should be designed as described here, nor endorsing this architecture as fully supported.

Provisioning temporary storage

Let’s restart from the beginning. What is the objective? To provide via OpenShift some dedicated, fast storage that can be used for SAS and CAS working directories (SASWORK and CAS DISK CACHE).

Verify the current disk status

In our environment, we are using the Azure cloud as the underlying infrastructure. @Hans-Joachim Edert discusses a similar environment in his article Some SASWORK storage options for SAS Viya on Kubernetes, listing in the “hostPath” section some Microsoft virtual machine types that can provide the storage that fits our needs. All the virtual machines used by our cluster fall in this category and, according to Microsoft’s documentation, they have an additional internal SSD drive that could be used to host temporary storage. But there is a catch: Azure VMs automatically mount this drive to their OS at the /mnt path and, if we were using AKS, this would be a mount point which could be directly used as Kubernetes storage. With OpenShift things are slightly different.

OpenShift uses a dedicated operating system – Red Hat Enterprise Linux CoreOS (RHCOS) – which, by default, does not automatically mount these disks. We know from the documentation that the disks are there, but they are almost invisible. OpenShift provides a great tool for cluster administrators to quickly interact with the underlying RHCOS: the oc debug command. Using this tool, we can operate in a privileged session, just as if we connected to the node via ssh as the root user. Here is an example of how to use it to verify the disks available on the first worker node using the lsblk command:

# Find the 1st worker node's name
WORKER=$(oc get nodes -l "node-role.kubernetes.io/worker" -o name | head -1)
echo ${WORKER}
# Query disk status using the lsblk command in a debug session
oc debug ${WORKER}
  # now we are in the debug container
  chroot /host
  lsblk -o NAME,SIZE,FSTYPE,LABEL,MOUNTPOINT
  # exit the debug container
  exit
  exit

As an example, here is the output in our test environment:

$ # Find the 1st worker node's name
$ WORKER=$(oc get nodes -l "node-role.kubernetes.io/worker" -o name | head -1)
$ echo ${WORKER}
node/itaedr-r-0097-worker-1
$ # Query disk status using the lsblk command in a debug session on the node
$ oc debug ${WORKER}
Starting pod/itaedr-r-0097-worker-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.255.1.8
If you don't see a command prompt, try pressing enter.
sh-4.4#  # now we are in the debug container
sh-4.4#  chroot /host
sh-4.4#  lsblk -o NAME,SIZE,FSTYPE,LABEL,MOUNTPOINT
NAME     SIZE FSTYPE LABEL             MOUNTPOINT
sda      128G
|-sda1     1M
|-sda2   127M vfat   EFI-SYSTEM
|-sda3   384M ext4   boot              /boot
`-sda4 127.5G xfs    root              /sysroot
sdb      300G
`-sdb1   300G ntfs   Temporary Storage
sdc        1G ext4                     /var/lib/kubelet/pods/08cef6f4-8032-449d-bcea-309355cb383c/volumes/kubernetes.io~azure-disk/pvc-fa675cfb-cd0d-49e6-843e-44adec80241fsr0
sh-4.4#  # exit the debug container
...

Let’s try to interpret this output. We should be able to understand what disks are available to this node, and to the pods running there.

The first disk is sda. This has some partitions mounted on the node OS. In the above output, sda4 is the OS root, that is seen by the debug pod as /sysroot. The disk size is 128GB, which corresponds to what we requested when we created this Azure virtual machine.
The second disk is sdb and it does not show any mountpoint. It contains a single partition sdb1; from its label Temporary Storage we can understand that this is the temporary disk that Azure provides. Its size depends on the virtual machine type used to instantiate this node (Not all Azure machine types include a temporary disk!). The FSTYPE column show that this disk, although currently not used, contains a partition of type ntfs, which is a Windows filesystem.
There can be additional disks; these are dynamically created by OpenShift as requested by running pods PVCs. In our environment the default storage class is managed-premium (kubernetes.io/azure-disk), so, for each PVC, OpenShift creates a dedicated Azure Disk. In the example above, sdc is a 1GB disk mounted on the pod with id 08cef6f4-8032-449d-bcea-309355cb383c as a volume named pvc-fa675cfb-cd0d-49e6-843e-44adec80241f

👉 on each worker node, the names sda and sdb may be switched, because linux does not guarantee the order in which disks are configured at boot time.

A diagram of the disks available on a node in our test environment

Prepare the temporary disk

As we have seen above, the temporary disk, although currently not used, contains a partition of type ntfs, which is a Windows file system. In the next steps, we are going to use tools that expect/support Linux file systems. Since this is a temporary empty partition, we can overwrite it and create a new empty partition with a Linux xfs file system.

Again, we can use an OpenShift debug container, this time on all worker nodes, to re-format the temporary partition with the desired xfs file system:

# fix the partition on the temporary disk on all worker nodes
AZURE_TMP_DEV=/host/dev/disk/azure/resource
for _NODE in $(oc get nodes -l node-role.kubernetes.io/worker -o name); do
  # run inside the debug privileged container
  oc debug -q ${_NODE} -- bash -c "\
    parted ${AZURE_TMP_DEV} --script -- \
      mklabel gpt \
      mkpart xfs 1MiB -2048s ;\
    sleep 15;\
    lsblk ${AZURE_TMP_DEV};\
    mkfs -t xfs -f ${AZURE_TMP_DEV}-part1 \
  "
done

Within the previous commands we are using a trick to overcome the issue that, on some nodes, the temporary disk is sdb, while on other nodes it is sda: the node OS automatically creates a link called /dev/disk/azure/resource that points to the correct disk, and another link called /dev/disk/azure/resource-part1 that points to the first partition inside that disk. To use these links from inside the OpenShift debug container, we have to prefix /host to their paths.

Deploy the Local Storage Operator

To leverage the Azure temporary disks, it is possible to manually create local Volumes using standard Kubernetes practices, but that would require the cluster administrator to fully manage lower-level storage lifecycle operations.

OpenShift simplifies local storage management thanks to the Local Storage Operator.

This operator is not installed by default. We can install the Local Storage Operator following RedHat instructions using the OperatorHub from the web console.

Provision local storage using the Local Storage Operator

We can now use the Local Storage Operator to provision local storage for SAS Viya pods. The operator can create Persistent Volumes by looking for available file systems at the paths specified in a Local Volume custom resource. Here are a couple of yaml examples, that leverage the xfs file system previously created at /dev/disk/azure/resource-part1:

Create a Local Volume, available on all worker nodes (OpenShift controller nodes are excluded by default).

apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "lv-sastmp"
  namespace: "openshift-local-storage"
spec:
  storageClassDevices:
    - storageClassName: "sastmp"
      volumeMode: Filesystem
      fsType: xfs
      devicePaths:
        - /dev/disk/azure/resource-part1

Create a Local Volume, that only uses a subset of nodes, i.e. only the SAS CAS nodes. In this case, the yaml definition should include a nodeSelector. As an example, you could filter on the workload.sas.com/class=cas label:

apiVersion: "local.storage.openshift.io/v1"
kind: "LocalVolume"
metadata:
  name: "lv-sastmp-cas"
  namespace: "openshift-local-storage"
spec:
  nodeSelector:
    nodeSelectorTerms:
    - matchExpressions:
      - key: workload.sas.com/class
        operator: In
        values:
        - cas
  storageClassDevices:
    - storageClassName: "sastmp"
      volumeMode: Filesystem
      fsType: xfs
      devicePaths:
        - /dev/disk/azure/resource-part1

You can create the Local Volume resource in the OpenShift cluster by pasting the yaml code in the web console, in the Local Storage operator page, or by saving the content in a file and then applying it:

oc apply -f localVolume.yaml

The operator should automatically create the storage class referenced in the Local Volume, if it did not already exist. Then, it should start a pod to manage the storage on each node. Finally, it should create a Persistent Volume for each matching disk discovered on each node.

The disk view after defining a Local Volume that references the ephemeral disk

Test the newly referenced local storage

It's always a good practice to test your infrastructure before using it in SAS Viya – or in any production software!

Local volumes are accessed by pods through Persistent Volume Claims (PVCs).

Let's create a PVC that references our local volumes, and a pod to test the storage, with the following two yaml definitions:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: localpvc-sastmp
spec:
  accessModes:
  - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 100Gi
  storageClassName: sastmp

kind: Pod
apiVersion: v1
metadata:
  name: gel-test-pod
spec:
  containers:
  - name: gel-test-pod
    image: gcr.io/google_containers/busybox:1.27
    command:
      - "/bin/sh"
    args:
      - "-c"
      - "df -h | grep sd && touch /sastmp/SUCCESS && ls -l /sastmp/SUCCESS && exit 0 || exit 1"
    volumeMounts:
      - name: sastmp
        mountPath: "/sastmp"
  restartPolicy: "Never"
  volumes:
    - name: sastmp
      persistentVolumeClaim:
        claimName: localpvc-sastmp

You can notice in the pod definition that we reference the PVC just created (localpvc-sastmp) and mount it at the /sastmp path inside the container, so that we can use some commands to test reading and writing from that location.

The final step is to create these resources in the OpenShift cluster. The PVCs should be in the same namespace as the pods that will use them; we can specify the SAS Viya namespace:

oc apply -f localPVC.yaml -n gel-viya
oc apply -f localTestPod.yaml -n gel-viya

To check the successful allocation of the disk, we can read the log of the test pod:

oc logs gel-test-pod -n gel-viya

The test commands that we used in the pod definition should output something similar to the following:

/dev/sdb1               299.9G      2.1G    297.7G   1% /sastmp
/dev/sda4               127.5G      9.2G    118.3G   7% /etc/hosts
/dev/sda4               127.5G      9.2G    118.3G   7% /dev/termination-log
-rw-r--r--    1 root     root             0 Jan 13 23:31 /sastmp/SUCCESS

This shows that the ~300GB temp disk has been mounted at the /sastmp path as requested, and the pod was able to successfully write there a test file called "SUCCESS".

A final view showing the ephemeral disk used in the test pod

Conclusion

In this first article, we have seen how to leverage OpenShift Local Volume Operator to make Azure VM ephemeral disks available to the pods running in the OpenShift Container Platform cluster. Now that this is done, the next step will be to use them for SAS Viya temporary storage. You can read it in the next article: SAS Viya Temporary Storage on Red Hat OpenShift – Part 2: CAS DISK CACHE

Find more articles from SAS Global Enablement and Learning here.