BookmarkSubscribeRSS Feed

Using CSI drivers for your Viya platform storage, some considerations and lessons learned…

Started ‎03-07-2025 by
Modified ‎03-07-2025 by
Views 372

As a Deployment engineer - about to start the installation of a new Viya environment in Kubernetes - or as a Technical Architect - working on the storage design for a Viya project - you may hear, or be questioned by your customer about something called the "CSI drivers"...

 

When searching in the SAS Official documentation, you could be surprised to see no list of officially supported CSI drivers for Kubernetes! It is because SAS can unfortunately not test each and every technology associated to Kubernetes or Containers in every scenario of a SAS Viya deployment.

 

As my colleague, Rob Collum recently said in one of our team meetings : "Some things are tested and supported by SAS and some other things… can possibly work"…his remark was spot-on : a lot of technologies used in the Cloud or in Kubernetes platforms can be used and are likely to work with SAS Viya, but it does not mean that they have been officially tested by SAS for every kind of  SAS Viya deployment…  

 

However, what we can do, as SAS professionals is to share the experience and the lessons learned in the field where customers have implemented proven technologies, even when they were sometimes not explicitly listed as fully tested and supported in the SAS Documentation.

 

There are several examples of that: Customers deploying Kubernetes platform under limited support (Tanzu, Rancher, etc.), customers using new network or storage container integration technologies, etc.

 

In this particular post, we discuss and share some feedback on the usage of the Container Storage Interface (CSI) drivers.

 

What are the CSI drivers?

 

In order to expose external storage to the containers running in the pods, Kubernetes initially came up with an "in-tree" volume plugin system (including things like awsElasticBlockStore, azureDisk or cinder).

 

But the code was part of the Kubernetes system which caused friction over time…it forced storage vendors to align with the Kubernetes release system, and caused reliability and security issues in the core Kubernetes binaries (with a code that was difficult to test and maintain for the Kubernetes maintainers).

 

To address this challenge, the CSI (Container Storage Interface) was developed as "a standard for exposing arbitrary block and file storage systems to containerized workloads on Container Orchestration Systems (COs) like Kubernetes".     

 

Or simply put, a CSI driver is a standard that allows Kubernetes to interact with storage systems.

 

Nowadays, most of the original volume plugins (awsElasticBlockStore, azureDisk or cinder) are deprecated and the official Kubernetes documentation recommends to use the "out-of-tree" volume plugins implementing the CSI.

 

Once installed in the cluster, new available storage classes appear in Kubernetes and can be referenced by the SAS Viya Persistent Volume claims, volumeClaimTemplates or Custom resources.

 

In the Cloud Managed Kubernetes environments (AKS, EKS, GKE, etc.) the Cloud CSI drivers maybe be pre-installed and the associated storage classes are already available.

 

Why do we need them?

 

Cloud providers provide CSI drivers for a better integration with their storage managed services. In the "on-premise" world, Storage systems vendors provide CSI drivers for a better integration with their SAN or NAS storage systems.

 

The usage of the NFS subdir provisioner is generally more flexible than the CSI drivers and its usage is currently encouraged both in our official documentation and in the SAS "Deployment as Code" GitHub project (aka DAC).

 

However, we see more and more customers expressing concern about using "open-source community provided" software in their production environments and require the use of official Cloud supported CSI drivers instead.

 

In addition of the Cloud vendor support, there can be other benefits such as TLS encryption in-transit or at-rest, automatic reprovisioning when additional space is required, advanced integration with the Cloud file services (snapshots, backups, etc.).

 

 

CSI Drivers Known limitations and requirements

 

01_RP_dilbert.png

The CSI drivers are mentioned in only two specific locations of the SAS official documentation.

 

 

Source: dilbert.com (Scott Adams) 

Limitations for Azure Files and EKS CSI drivers

 

While the SAS documentation does not provide a formal list of supported CSI drivers, there is a warning in this section against the usage of specific CSI drivers.

 

Two specific CSI drivers that do not meet the CAS and the SPRE pods requirement for a fully POSIX-compliant system, are listed:

 

  • The Microsoft AKS Container Storage Interface (CSI) driver for Azure Files.
    • When this CSI driver is set to use the SMB protocol, it does not support the standard POSIX file system operations that SAS processes use to unlink or delete files. The Microsoft AKS Container Storage Interface (CSI) driver for Azure Files.
  • The Amazon EKS Elastic File Service (EFS) CSI driver.
    • This driver does not allow files to be owned by distinct UIDs and GIDs. Certain SAS processes would not be able to chown files that are stored in these volumes.
  • In addition, it is recommend to avoid the use of Azure Files as the backing storage for the internal PostgreSQL instance because SAS testing with this configuration has found multiple issues.

 

These warnings come from real life experience where customers have tried to use these CSI drivers in the field and faced issues during file access operations with CAS, the Compute servers or the internal crunchy PostgreSQL.

As an example read about two SAS colleagues' experience with the aws-efs-csi-driver…) in the extract below from a SAS Internal blog :

 

Using the aws-efs-csi-driver

In our IAC for AWS, the nfs-subdir-external-provisioner is used when EFS is the desired persistent storage. However, when customers aren’t using our IAC, the aws provided efs csi driver is commonly used instead. Honestly, it makes sense logically, but there is of course a gotcha when this choice is made.

While EFS claims to be POSIX compliant, it seems the provided csi driver didn’t get the memo. When using the EFS CSI driver, there is no way to alter ownership of files or directories in PVCs. The chown/chgrp command will fail as described here.  To add to the fun, the issue has been marked as closed and not planned…

This often presents itself as:

    rsync: chown …  failed: Operation not permitted (1)

 For now, our best recommendation is to avoid use of the aws-efs-csi-driver and instead use the nfs-subdir-external-provisioner as the IAC does.

 

But as noted in the documentation, and explained in the post’s introduction: "This is not an exhaustive list of storage options that do not fulfill SAS requirements. Many storage options are available across all the supported cloud platforms and Kubernetes distributions. SAS cannot test with all of them."

 

 

Understanding the nuances…

 

Now we have to be careful…these known limitations, listed above, DO NOT MEAN that Azure Files or Amazon EFS storage can never be used with SAS Viya…

 

The devil is in the detail…and yes I know, it is complicated…

 

But for example:

 

  • Azure Files Premium, with the NFS protocol support can be used and could be a great fit for many Viya volumes that requires RWX access mode.
  • Amazon EFS, is officially documented as the storage to use in HA deployment of SAS Viya (preferred over the Single VM NFS server option for the RWX Viya volumes), and is a supported option in the SAS provided IaC/DaC tools. However it is preferable to mount the EFS storage with a generic NFS provisioner (rather than the EFS CSI driver as it has some limitations).

 

So, when considering the storage options for a Viya environment, it is important to take several things into account :

 

    • The kind of Viya volume we deal with:
      • Is it one of the Viya platform dynamically assigned during the deployment?
      • or a static volume to store user’s data?
      • are we looking for a shared storage (RWX) or would block storage (match RWO access mode request) be a better option for this volume?
    • Full POSIX support (usually depends on the protocol used to access the file system).
    • How is the storage mounted (specific CSI driver or generic NFS provisioner?), is there a minimum storage size for the volumes mounted with the CSI?

 

 

Required CSI Driver

 

The other location where the CSI driver is mentioned in the SAS Official documentation is in the "Cluster requirements for AWS" section.

 

Instead of a limitation, this time, there is a requirement to have the AWS Elastic Block Store (EBS) CSI driver installed in the cluster.

 

As explained in this blog post from Rob Collum, the reason of this requirement is that since EKS 1.23, without a specific installation of the EBS CSI driver inside the cluster, it is not possible to use the default AWS gp2 or gp3 storage classes.

 

These default storage classes are recommended for the SAS Viya volumes that require RWO access (such as OpenDistro, Redis, RabbitMQ, etc.…).

 

 

Recap of the most common Cloud storage provisioner and CSI

 

At this point, and to avoid further confusion, let's look at a small table listing the most common Cloud vendors CSI drivers.

 

Cloud Provider

CSI Driver

Storage

Storage type

Access mode

AWS

efs.csi.aws.com

EFS (Elastic File storage)

NFS managed service

RWX

AWS

ebs.csi.aws.com

EBS (Elastic Block Storage)

Managed disk

(Block storage)

RWO

Azure

file.csi.azure.com

Azure Files Premium

NFS managed service

(NFS support)

RWX

Azure

disk.csi.azure.com

Azure Disks

Managed disk

(Block storage)

RWO

Google

filestore.csi.storage.gke.io

Google Filestore

NFS managed service

(NFS support)

RWX

Google

pd.csi.storage.gke.io

Google Persistent Disks

(standard-rwo)

Managed disk

(Block storage)

RWO

 

 

The Azure Files CSI driver "trap"

 

While you can use the Azure Files CSI driver (assuming you use Azure Files premium with the NFS support), there is an additional gotcha that you’ll need to know if you want to use it for your SAS Viya platform dynamically provisioned volumes...

 

Back in 2023, a SAS colleague explored the use the Azure Files Premium CSI driver and provided guidance on how to implement it with the NFS protocol and the nconnect mount option.

He wrote an internal blog that shows how you can use the Azure Files storage for your SAS Viya dynamically provisioned volumes during the deployment and provision them with the Azure Files CSI driver and that it all works perfectly.

 

…HOWEVER…

 

In this case, for each dynamically requested Viya volume, the use of the CSI driver triggers the provisioning a distinct new Azure File Share.

 

Now the problem is that, as of today the minimum size for Azure Premium files share is 100GB

 

The result is that the provisioned capacity in Azure is way higher than what is effectively needed…Some volumes only require 1GB (or even less), but a 100GB file share is provisioned in Azure for each of them.

 

The screenshot below compares the required capacity with the provisioned capacity.  

 

 

02_RP_azure-files-volumes.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

So while it is technically possible to use the CSI driver here, the cost is very high and a lot of space is wasted.  This situation was recently faced at a customer site by a SAS colleague in the AP region.

 

A first workaround, that the customer tried, was to set the shareName attribute in the Azure Files storage class. When this attribute is set, all volumes that use the azurefile-csi storage class consolidate their content in the same Azure Files share (so we don't need to create one file share for each PVC).

 

For example, if we have the following 9 PVCs:

 

  • cas-default-data
  • cas-default-permstore
  • sas-cas-backup-data
  • sas-common-backup-data
  • sas-commonfile
  • sas-pyconfig
  • sas-risk-cirrus-core-backup-data
  • sas-risk-cirrus-search-data

 

…then the content of all the corresponding PV is created at the root level of a single file share, as below:

 

03_RP_single-fileshare-directory.png

 

… which poses a risk of data contention and potential overwrites due to the lack of directory isolation.

 

This shareName attribute has been tested by the SAS R&D who confirmed that it was not suited for dynamic volume provisioning. Using this setting with Viya could lead to data corruption issues as the Viya components, which request storage through distinct PVCs, expect distinct volumes to be provisioned.

 

As a result of this experience and test, a new section Guidance When Using Microsoft Azure Files was added in the official documentation.  

 

The official workaround, in order to use Azure Files for the dynamic provisioning of the RWX-access based Viya platform volumes, is to use an NFS provisioner. The Kubernetes NFS Subdir External Provisioner can be used to create a custom storage class based on an existing Azure Premium Files share. The NFS provisioner can create distinct subdirectories for each Viya volumes under the Azure file share. Note that (at the time of this post write up), this NFS subdir external provisioner is used for various NFS -based storage types by the Deployment as Code project (aka "viya4-deployment").

 

 

A better experience with the GKE Filestore CSI Driver

 

This "over-provisioning" issue when using the CSI driver for dynamic provisioning is not specific to Azure, the same problem has been seen in the past with the Google File Store.

 

However while the GKE Filestore CSI driver version 1.26 and earlier only allowed for 10 shares at 100GB minimum, things have been improved with the enterprise-multishare-rwx GKE storage class, where the minimum size has been reduced to 10 GB.

 

While there might be a bit of extra provisioned space, the overhead is minimal.

 

 

Conclusion

 

While using CSI drivers can cause issues, it is also something that is expected to be required by more and more customers…as it often provides a better integration with the managed storage services and extends the Kubernetes capabilities (encryption, snapshot, DR ,etc.).

 

But remember that there is no list of SAS officially supported CSI drivers...what we have in the documentation today is more a list of CSI drivers that SHOULD NOT be used for specific volumes. The cost efficiency also depends on the kind of volumes for which you would use the CSI driver. For example Azure Files CSI driver, which is supported with the premium SKU (NFS support) is probably not cost efficient for all the Viya platform dynamically created volumes, but it may be cost efficient for the SASDATA or CASDATA static volumes...

 

Each volume comes with its own specific requirements (access mode, capacity, latency, security, POSIX compliance, data protection, etc.…). Having multiple storage classes is a recommended practice as it allows to associate the right kind of storage technology and device to each Persistent volume.

 

The topic of storage in Kubernetes is complex and the technologies are evolving rapidly…Some of the issues or limitations discussed in this post may disappear in the future, but other issues could arise when untested scenarios are implemented…

 

Here are two examples :

 

  • It has been recently discovered that using Azure NetApp Ultra (NFS based storage) for the internal crunchy PostgreSQL was causing issues during the upgrade operations to the new PostgreSQL 16 version. The current recommendation is to use block storage for the internal crunchy PostgreSQL volumes.
  • The Kubernetes sig-storage group recommends to move from the NFS subdir external- provisioner (the project is not maintained) to the NFS CSI driver. SAS is currently evaluating alternatives for the viya4-deployment GitHub project.

 

Finally while it would be impossible to test, support and list all possible CSI drivers implementation with the SAS Viya platform, it is assumed that the CSI drivers can be used for most of the SAS Viya volumes in general. However in case new problems or limits with specific CSI drivers are discovered in the field and reported, they can be reproduced and assessed by SAS and potentially documented to increase awareness across the SAS Technical community.

 

That’s it for today!

Thanks for reading!!!

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎03-07-2025 06:24 AM
Updated by:
Contributors

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags