In the first part, we explored how block storage is being provisioned for the default CAS libraries, how to know where it is stored physically, how to access it if necessary, etc. In this second part, we will explore how to customize SAS Viya on Google Cloud Platform in order to:
The goal is to add a custom Google Filestore instance to the SAS Viya environment for managing customer data for example, outside of existing cloud resources already provisioned through Terraform scripts (for instance) as depicted below. The node pools and pods that will make use of it are CAS and Compute, the two data processing engines:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
So, let’s jump on the step-by-step process to leverage a cloud native NFS-based storage in SAS Viya on GCP for managing existing data.
1 – Provision the GCP NAS (Network Attached Storage) Service: Google Filestore
Many options are available when it comes to create a Google Filestore instance. You can choose among Basic (cost optimized) or High Scale (focus on performance) instance types which basically make use of Hard Disk Drives (HDD) or Solid State Drives (SSD). We will create a simple one from a client machine where the Google Cloud CLI has been configured properly:
gcloud beta filestore instances create geldata --zone=us-east1-b \
--tier=basic-hdd --file-share=name="vol",capacity=1TB \
--network=name="myviya-gcpdm-vpc"
You can get information about your recently created Filestore instance:
$ gcloud beta filestore instances describe geldata --zone=us-east1-b
createTime: '2021-07-01T18:31:14.685628763Z'
fileShares:
- capacityGb: '1024'
name: vol
name: projects/sas-gelsandbox/locations/us-east1-b/instances/geldata
networks:
- ipAddresses:
- 10.X.Y.42
network: myviya-gcpdm-vpc
reservedIpRange: 10.X.Y.80/29
state: READY
tier: BASIC_HDD
PS: this step assumes that the Google Cloud SDK (gcloud) has been installed beforehand on the client machine.
2 – Setup the Google Filestore instance properly
In this step, we want to initialize the Google Filestore with some folders and optionally manage permissions. Here we will just add a root folder and add all permissions for simplicity.
To do that, we will use the Jump Server provided when we provision GCP resources for Viya using the viya4-iac-gcp Infrastructure as Code GitHub project. It has the advantages to be:
Here are the pieces involved:
First, we need to get the Jump Server IP address from where we ran the Terraform script (IaC). The Terraform state will help us find the right information:
# Get the Jump Server IP address
cd /sasviya/scripts/projects/viya4-iac-gcp/
export JUMPIP=$(terraform output -raw -state=./myviya-terraform.tfstate jump_public_ip)
echo JUMPIP=$JUMPIP
We could have used gcloud to get the Jump Server IP address:
export JUMPIP=$(gcloud compute instances list | grep myviya-gcpdm-jump-server \
| awk '{print $5}')
echo JUMPIP=$JUMPIP
Then we can connect to the Jump Server:
ssh -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key jumpuser@${JUMPIP}
We use the SSH key provided during provisioning. And the user is “jumpuser”, defined in the Terraform variables file. Now that we are on the Jump Server, we can:
NFSIP=`gcloud filestore instances list | grep geldata | awk '{print $6}'`
echo NFSIP=$NFSIP
sudo mkdir -p /mnt/geldata
sudo mount $NFSIP:/vol /mnt/geldata
sudo mkdir /mnt/geldata/data
sudo chmod 777 /mnt/geldata/data
# Exit ssh
exit
PS: this step assumes that the Google Cloud SDK (gcloud) has been installed beforehand on the Jump Server. This piece of code mounts the Filestore instance on the Jump Server, creates a single folder named “data”, change the permission to 777.
3 – Optional – Load existing data on the Google Filestore instance
If you are in a migration project, your customer might want to know how to quickly leverage existing data. While we are not tackling the whole migration process in this article, let’s illustrate one aspect of it: make existing data available to SAS Viya.
So, the Google Filestore instance that we have setup can be the data file server that we need for accessing data from SAS and CAS.
In our context, 2 options (among others) can be used to upload data in the Google Filestore instance:
From the client:
# Check the contents of the target
ssh -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key jumpuser@${JUMPIP} \
"ls -al /mnt/geldata/data/"
# Copy the data from the Cloud Login Server to Filestore through the Jump Server
scp -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key -r /sasviya/data/* \
jumpuser@${JUMPIP}:/mnt/geldata/data/.
# Check the contents of the target
ssh -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key jumpuser@${JUMPIP} \
"ls -al /mnt/geldata/data/"
Using GCS as a staging area:
# ssh to the Jump Server
ssh -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key jumpuser@${JUMPIP}
# Assuming the data to be copied is already available in a GCS bucket
# Copy the bucket contents to Filestore through the NFS mount
gsutil -m cp -r gs://myviya-gcpdm/data/* /mnt/geldata/data/.
# Check the contents of the target
ls -al /mnt/geldata/data/
# Exit ssh
exit
PS: this step assumes that the Google Cloud SDK (gsutil) has been installed beforehand on the Jump Server. All right, we now have a Google Filestore instance populated with some existing data.
4 – Prepare the Kubernetes configuration files to mount the Filestore instance in the CAS and SAS Compute pods
We are ready to start modifying the Viya deployment to take into account our new data file server. First, get the Google Filestore instance IP address on the client:
NFSIP=`gcloud filestore instances list | grep geldata | awk '{print $6}'`
echo $NFSIP
Then create 3 overlays:
---
apiVersion: builtin
kind: PatchTransformer
metadata:
name: cas-add-filestore-mount
patch: |-
- op: add
path: /spec/controllerTemplate/spec/volumes/-
value:
name: sas-filestore
nfs:
path: /vol/data
server: $NFSIP
- op: add
path: /spec/controllerTemplate/spec/containers/0/volumeMounts/-
value:
name: sas-filestore
mountPath: /filestore/data
target:
group: viya.sas.com
kind: CASDeployment
name: .*
version: v1alpha1
---
apiVersion: builtin
kind: PatchTransformer
metadata:
name: compute-add-filestore-mount
patch: |-
- op: add
path: /template/spec/volumes/-
value:
name: sas-filestore
nfs:
path: /vol/data
server: $NFSIP
- op: add
path: /template/spec/containers/0/volumeMounts/-
value:
name: sas-filestore
mountPath: /filestore/data
target:
name: sas-compute-job-config
version: v1
kind: PodTemplate
---
apiVersion: builtin
kind: PatchTransformer
metadata:
name: cas-add-filestore-allowlist
patch: |-
- op: add
path: /spec/appendCASAllowlistPaths/-
value:
/filestore/data
target:
group: viya.sas.com
kind: CASDeployment
name: .*
version: v1alpha1
EOF
Notice that:
5 – Reference the overlays in the kustomization.yaml file
Add the following lines to the transformers block of the kustomization.yaml file:
- site-config/cas/cas-add-filestore-mount.yaml
- site-config/sas-compute-server/compute-add-filestore-mount.yaml
- site-config/cas/cas-add-filestore-allowlist.yaml
6 – Build and apply site.yaml
To apply the overlays, we need to build the Kubernetes manifest:
kustomize build -o site.yaml
And apply it:
kubectl -n gelgcp apply -f site.yaml
7 – Restart CAS
To take changes into account, we need to restart CAS:
kubectl -n gelgcp delete pods -l casoperator.sas.com/server=default
8 – Test the access to the Filestore instance from SAS Viya
It is time to test the access to the Google Filestore instance from SAS and see if we can access existing data, create new tables as well as share data between the SAS Compute Server and CAS:
78 /* SAS access to Filestore */
79
80 libname sasfs "/filestore/data" ;
NOTE: Libref SASFS was successfully assigned as follows:
Engine: V9
Physical Name: /filestore/data
81
82 proc datasets lib=sasfs ;
Directory
Libref SASFS
Engine V9
Physical Name /filestore/data
Filename /filestore/data
Inode Number 61079553
Access Permission rwxrwxrwx
Owner Name root
File Size 4KB
File Size (bytes) 4096
Member
# Name Type File Size Last Modified
1 HMEQ DATA 704KB 07/02/2021 15:28:10
83 quit ;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
84
85 data sasfs.sas_prdsale ;
86 set sashelp.prdsale ;
87 run ;
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set SASFS.SAS_PRDSALE has 1440 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
88
89 proc datasets lib=sasfs ;
Directory
Libref SASFS
Engine V9
Physical Name /filestore/data
Filename /filestore/data
Inode Number 61079553
Access Permission rwxrwxrwx
Owner Name root
File Size 4KB
Directory
File Size (bytes) 4096
Member
# Name Type File Size Last Modified
1 HMEQ DATA 704KB 07/02/2021 15:28:10
2 SAS_PRDSALE DATA 256KB 07/02/2021 19:01:35
90 quit ;
NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds
91
92 /* CAS access to Filestore */
93
94 cas mysession ;
NOTE: The session MYSESSION connected successfully to Cloud Analytic Services sas-cas-server-default-client using port 5570. The
UUID is 3b31c471-97b0-0a49-9d6b-06dec184f040. The user is Alex and the active caslib is CASUSER(Alex).
NOTE: The SAS option SESSREF was updated with the value MYSESSION.
NOTE: The SAS macro _SESSREF_ was updated with the value MYSESSION.
NOTE: The session is using 4 workers.
95
96 caslib filestore type=path path="/filestore/data" libref=casfs ;
NOTE: 'FILESTORE' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'FILESTORE'.
NOTE: CASLIB FILESTORE for session MYSESSION will be mapped to SAS Library CASFS.
NOTE: Action to ADD caslib FILESTORE completed for session MYSESSION.
97
98 proc casutil ;
NOTE: The UUID '3b31c471-97b0-0a49-9d6b-06dec184f040' is connected using session MYSESSION.
99
99 ! list files ;
Caslib Information
Library FILESTORE
Source Type PATH
Path /filestore/data/
Session local Yes
Active Yes
Personal No
Hidden No
Transient No
CAS File Information
Encryption Last Modified
Name Permission Owner Group Method File Size (UTC)
hmeq.sas7bdat -rw-rw-r-- 704.0KB 02JUL2021:15:28:11
customers.sashdat -rw-rw-r-- NONE 15.9KB 02JUL2021:15:28:11
sas_prdsale.sas7bdat -rw-r--r-- 256.0KB 02JUL2021:19:01:35
contact_list.csv -rw-rw-r-- 3.7KB 02JUL2021:15:28:11
NOTE: Cloud Analytic Services processed the combined requests in 0.016016 seconds.
100 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.06 seconds
cpu time 0.03 seconds
101
102 data casfs.cas_prdsale ;
103 set sashelp.prdsale ;
104 run ;
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set CASFS.CAS_PRDSALE has 1440 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.08 seconds
cpu time 0.06 seconds
105
106 proc casutil ;
NOTE: The UUID '3b31c471-97b0-0a49-9d6b-06dec184f040' is connected using session MYSESSION.
107
107! save casdata="cas_prdsale" casout="cas_prdsale.sashdat" replace ;
NOTE: Cloud Analytic Services saved the file cas_prdsale.sashdat in caslib FILESTORE.
NOTE: The Cloud Analytic Services server processed the request in 0.042732 seconds.
108
108! list files ;
Caslib Information
Library FILESTORE
Source Type PATH
Path /filestore/data/
Session local Yes
Active Yes
Personal No
Hidden No
Transient No
CAS File Information
Encryption Last Modified
Name Permission Owner Group Method File Size (UTC)
hmeq.sas7bdat -rw-rw-r-- 704.0KB 02JUL2021:15:28:11
customers.sashdat -rw-rw-r-- NONE 15.9KB 02JUL2021:15:28:11
sas_prdsale.sas7bdat -rw-r--r-- 256.0KB 02JUL2021:19:01:35
contact_list.csv -rw-rw-r-- 3.7KB 02JUL2021:15:28:11
cas_prdsale.sashdat -rwxr-xr-x sas sas NONE 182.4KB 02JUL2021:19:01:36
NOTE: Cloud Analytic Services processed the combined requests in 0.019751 seconds.
109 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.13 seconds
cpu time 0.08 seconds
110
111 cas mysession terminate ;
NOTE: Libref CASFS has been deassigned.
NOTE: Deletion of the session MYSESSION was successful.
NOTE: The default CAS session MYSESSION identified by SAS option SESSREF= was terminated. Use the OPTIONS statement to set the
SESSREF= option to an active session.
NOTE: Request to TERMINATE completed for session MYSESSION.
kubectl -n gelgcp get pod sas-cas-server-default-controller \
-o json | jq '.spec.containers[] | select(.name=="cas") | .volumeMounts[]'
{
"mountPath": "/filestore/data",
"name": "sas-filestore"
}
kubectl -n gelgcp get pod sas-cas-server-default-controller \
-o json | jq '.spec.volumes[]'
{
"name": "sas-filestore",
"nfs": {
"path": "/vol/data",
"server": "10.X.Y.42"
}
}
Find more articles from SAS Global Enablement and Learning here.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.