Additional RWX volumes for SPRE in SAS Viya – from Project Mountpoint

My colleague Ryan King has brought us a blast from the past in his last couple of posts here (1, 2). In those posts, he reintroduces us to the SCAPROC Procedure, how it works, and the many benefits we can realize from its use in SAS Viya.

During the course of his personal testing in support of that post, he realized that a shared directory is needed which the jobs generated by SCAPROC could use to pass interim data from one step to the next. In context of the SAS 9.4 Grid Manager products, we referred to this shared space as "gridwork".

As it turns out, solving gridwork for SAS Viya is a pretty common storage pattern. To help clarify this pattern for the field, we've added a new guide in Project Mountpoint: Additional RWX volumes for SPRE.

Keep reading here for a quick dive into adding a new RWX storage volume to the SPRE pods in your SAS Viya environment.

SCAPROC means the SPRE needs gridwork

I liked Ryan's take on SCAPROC so much that I included it as one of the Top 10 Performance Tuning Tips for the SAS Viya platform (github.com) presented at SAS Innovate 2026.

The SCAPROC procedure shows how to take a large, monolithic, serialized program file so that the tasks it performs are split out as parallel jobs. This can help reduce total run time significantly. But running SCAPROC alone is not enough - we need to think architecturally about what's happening. In particular, how is the resulting data from one parallel job able to be read in by the next job in the flow? Where is that interim data stored so that these jobs - which could be running on different Kubernetes nodes - can find it?

We cannot use SASWORK - at least not as it's defined for SAS Viya by default (as an emptyDir volume). We could set up a shared volume (like NFS) for SASWORK universally, but that's typically not preferred because it's slow with network traffic, latency, and contention. Most SASWORK use is as scratch space solely for the single SAS session - in other words, it's not something we should rely on a shared file system for unless absolutely necessary.

We need a "gridwork" shared directory. Gridwork was originally defined for SAS 9.4 as a "shared directory that the job uses to store the program, output, and job information". It's not formally defined for submitting jobs to the SPRE in SAS Viya, but we can re-use this idea of a shared space where discrete SAS programs can handoff intermediate data sets to each other.

Planning for gridwork

Let's consider the architectural aspects of gridwork and how we can achieve that for SAS Viya running in a Kubernetes environment.

Is gridwork persistent or ephemeral?

The space we allocate for a shared volume to act as gridwork must be persistent - that is, it will continue to exist independently of the pod lifecycles that interact with it. That makes sense - Job 1 in my parallel job flow generates an output data set and then shuts down. Later, Job 7 (possibly running on a different Kubernetes host) in my flow will need to refer to Job 1's data set for its own processing.
Is gridwork RWO or RWX?

For a typical SAS Viya platform deployment, the "Compute" node pool is labeled (and possibly tainted) for running SPRE jobs. Having a node pool implies that the parallel jobs generated by SCAPROC should be expected to run on different Kubernetes hosts. To implement gridwork then, we need a shared file system that can be mounted to multiple nodes of the Kubernetes cluster simultaneously. This is RWX (read-write-many) access.

It's up to your site IT team to determine which RWX shared storage provider technology to use here. Keep in mind that accessing SAS data sets for analytic purposes employs long, sequential reads and writes to disk. SAS typically recommends storage I/O that can deliver 100+ MB/sec/core to ensure reasonable performance.
Should we assume a persistent volume for gridwork already exists?

I don't think so. If we define gridwork's use as only for interim, short-lived data of parallel job flows -- and not for long-term reference tables that would be better served from a data mart or elsewhere - then there's no reason to manually stand up the persistent volume in advance. Instead, we can employ a PVC that refers to a storage class which allows Kubernetes to dynamically provision the persistent volume when it's first needed. And the PV will continue to exist after that until other steps are explicitly taken for its removal.

These answers and assumptions are not necessarily universal. There could be other use-cases to consider at your site. But they give us a good starting point to build from.

It's YAML time

Let's create the YAML files we need to reference from the $deploy/kustomization.yaml that's used for configuring the deployment of SAS Viya. We need to do two things: 1) create a PVC resource for gridwork that refers to a suitable RWX storage class, and 2) modify the podTemplates to add a volumeMount to the SPRE pods so they can access gridwork.

New resource definition to create PVC

The following YAML file can be added to your $deploy/site-config/storage directory. Add its path/filename to your $deploy/kustomization.yaml in the "resources:" section.

---
# GRIDWORK: Define PVC for new RWX volume for use as "gridwork" volume in SPRE pods podTemplates
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: spre-rwx-gridwork-pvc
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: viya-shared-sc  # Specify your RWX storage class
  resources:
    requests:
      storage: 1Ti  # Adjust size based on your gridwork requirements

This will create a PVC that requests a 1 tebibyte persistent volume referring to the storage class that's defined for a CSI driver in support of the site's RWX shared file system. The "viya-shared-sc" example storage class shown here is from the 3 Starter Storage Classes guide, but you should expect your site IT team to provide this for you.

Note that you could simply "kubectl apply" this file directly, but I prefer to include it with my kustomization files to make clear that this resource and the one below are intended to work together.

PatchTransformer to add volumeMount

The following YAML file can be added to your $deploy/site-config/storage directory. Add its path/filename to your $deploy/kustomization.yaml in the "transformers:" section.

---
# GRIDWORK: Add new RWX volume to SPRE server podTemplates
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: add-spre-rwx-gridwork-pv
patch: |-
  - op: add
    path: /template/spec/volumes/-
    value:
      name: spre-rwx-gridwork-volume
      persistentVolumeClaim:
        claimName: spre-rwx-gridwork-pvc
  - op: add
    path: /template/spec/containers/0/volumeMounts/-
    value:
      name: spre-rwx-gridwork-volume
      mountPath: /data/gridwork
target:
    kind: PodTemplate
    annotationSelector: "sas.com/kustomize-base=sas-programming-environment"

This patch performs two "add" operations. The first adds the PVC reference to the pod. The second adds the volume mount location for the PVC to use in the pod.

When you re-run "kustomize build" followed by "kubectl apply", then the PVC definition will be created in the cluster and the new, shared 1 tebibyte gridwork volume will be mounted to the SPRE pods, accessible in SAS program code at the "/data/gridwork" path.

Put gridwork to good use

From a SAS programmer's perspective, using the new gridwork volume is very straightforward. The following SAS program is very simple, but consider it a placeholder for something much longer with multiple steps that SCAPROC can split to run as parallel jobs.


/* SCAPROC GLOBAL BEGIN */;
libname gridwork "/data/gridwork";
/* SCAPROC GLOBAL END */; 
 
/* local fileref */
filename myinput "/data/gridwork/input_file.csv";
 
/* Import CSV using fileref - save as SAS data set to gridwork */
proc import datafile="myinput"
            out=gridwork.myoutput 
            dbms=csv 
            replace;
run;
 
/* ... and many more steps to go ... */

Feed the program above into the SCAPROC procedure and it will include any statements between those "GLOBAL" comments such that they run with every remote session (or grid job). That way, you can provide the gridwork library that the rest of your SAS program code can refer to when handing off data across parallel jobs.

Additional considerations

Gridwork as we've defined it is a shared volume where SAS programs can write and read data from jobs running on different Kubernetes nodes. This is not SASWORK - the files in gridwork are not automatically deleted after the SPRE session terminates. That responsibility fully belongs to the SAS programmers using the space. They should be exhorted to employ good programming practices, including deleting temporary files from gridwork when they're no longer needed. Regular monitoring of the volume will also be required to more thoroughly clean up space to ensure it doesn't fill up unexpectedly.

The storage pattern

We've focused on gridwork here. However, a similar approach could be implemented for a data mart volume, user home directories (also see Gerry Nelson’s recent post), database driver files for use by SAS/ACCESS (see Nicolas Robert’s JDBC post), and so on. Each of those are variations on this theme so they share many of the same implementation details (but probably not all of them). For example, a data mart would likely already exist so you wouldn't employ a storage class in the PVC definition.

Project Mountpoint provides the tools and guidance to help navigate these storage considerations for SAS Viya. The proper provisioning and correct configuration of storage should be treated as a high priority. Performance of the flagship analytic engines (like SPRE and CAS) provided with the SAS Viya platform is often constrained by storage I/O and so it should be treated as critical infrastructure with sufficient architectural planning.

Need more information?

For learn more about this topic and other aspects of the SAS Viya platform, visit learn.sas.com to view the SAS Architecture and Security Learning Subscription and the SAS Deployment Learning Subscription.

Find more articles from SAS Global Enablement and Learning here.