BookmarkSubscribeRSS Feed

Exploring SAS Viya on Google Kubernetes Engine (GKE) – Custom Path-Based CASLIBs or Libraries

Started ‎07-21-2021 by
Modified ‎07-21-2021 by
Views 3,906

In the first part, we explored how block storage is being provisioned for the default CAS libraries, how to know where it is stored physically, how to access it if necessary, etc. In this second part, we will explore how to customize SAS Viya on Google Cloud Platform in order to:

 

  • Add a GCP NFS-based storage to the SAS Viya deployment
  • Make existing data available to SAS and CAS in case of a migration

 

The goal is to add a custom Google Filestore instance to the SAS Viya environment for managing customer data for example, outside of existing cloud resources already provisioned through Terraform scripts (for instance) as depicted below. The node pools and pods that will make use of it are CAS and Compute, the two data processing engines:

 

nir_post_66_01_arch_filestore.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

So, let’s jump on the step-by-step process to leverage a cloud native NFS-based storage in SAS Viya on GCP for managing existing data.  

 

1 – Provision the GCP NAS (Network Attached Storage) Service: Google Filestore

 

Many options are available when it comes to create a Google Filestore instance. You can choose among Basic (cost optimized) or High Scale (focus on performance) instance types which basically make use of Hard Disk Drives (HDD) or Solid State Drives (SSD). We will create a simple one from a client machine where the Google Cloud CLI has been configured properly:

 

gcloud beta filestore instances create geldata --zone=us-east1-b \
  --tier=basic-hdd --file-share=name="vol",capacity=1TB \
  --network=name="myviya-gcpdm-vpc"

 

You can get information about your recently created Filestore instance:

 

$ gcloud beta filestore instances describe geldata --zone=us-east1-b
createTime: '2021-07-01T18:31:14.685628763Z'
fileShares:
- capacityGb: '1024'
  name: vol
name: projects/sas-gelsandbox/locations/us-east1-b/instances/geldata
networks:
- ipAddresses:
  - 10.X.Y.42
  network: myviya-gcpdm-vpc
  reservedIpRange: 10.X.Y.80/29
state: READY
tier: BASIC_HDD

 

PS: this step assumes that the Google Cloud SDK (gcloud) has been installed beforehand on the client machine.  

 

2 – Setup the Google Filestore instance properly

 

In this step, we want to initialize the Google Filestore with some folders and optionally manage permissions. Here we will just add a root folder and add all permissions for simplicity.

 

To do that, we will use the Jump Server provided when we provision GCP resources for Viya using the viya4-iac-gcp Infrastructure as Code GitHub project. It has the advantages to be:

 

  • Easy to connect to using the SSH key specified at provisioning time with the IaC project
  • In the same VPC network as the other GCP resources

 

Here are the pieces involved:

 

nir_post_66_02_arch_with_jump.png

 

First, we need to get the Jump Server IP address from where we ran the Terraform script (IaC). The Terraform state will help us find the right information:

 

# Get the Jump Server IP address
cd /sasviya/scripts/projects/viya4-iac-gcp/
export JUMPIP=$(terraform output -raw -state=./myviya-terraform.tfstate jump_public_ip)
echo JUMPIP=$JUMPIP

 

We could have used gcloud to get the Jump Server IP address:

 

export JUMPIP=$(gcloud compute instances list | grep myviya-gcpdm-jump-server \
   | awk '{print $5}')
echo JUMPIP=$JUMPIP

 

Then we can connect to the Jump Server:

 

ssh -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key jumpuser@${JUMPIP}

 

We use the SSH key provided during provisioning. And the user is “jumpuser”, defined in the Terraform variables file. Now that we are on the Jump Server, we can:

 

  • Get the Google Filestore IP address
  • Mount the Google Filestore instance on the Jump Server
  • Prepare it

 

NFSIP=`gcloud filestore instances list | grep geldata | awk '{print $6}'`
echo NFSIP=$NFSIP

sudo mkdir -p /mnt/geldata
sudo mount $NFSIP:/vol /mnt/geldata
sudo mkdir /mnt/geldata/data
sudo chmod 777 /mnt/geldata/data

# Exit ssh
exit

 

PS: this step assumes that the Google Cloud SDK (gcloud) has been installed beforehand on the Jump Server. This piece of code mounts the Filestore instance on the Jump Server, creates a single folder named “data”, change the permission to 777.  

 

3 – Optional – Load existing data on the Google Filestore instance

 

If you are in a migration project, your customer might want to know how to quickly leverage existing data. While we are not tackling the whole migration process in this article, let’s illustrate one aspect of it: make existing data available to SAS Viya.

 

So, the Google Filestore instance that we have setup can be the data file server that we need for accessing data from SAS and CAS.

 

In our context, 2 options (among others) can be used to upload data in the Google Filestore instance:

  • Through the client: data is available from the client and pushed to Filestore through the Jump Server (scp+nfs)
  • Through Google Cloud Storage: data is moved from the customer site to GCS (using gsutil) and then from GCS to Filestore through the Jump Server

From the client:

# Check the contents of the target
ssh -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key jumpuser@${JUMPIP} \
   "ls -al /mnt/geldata/data/"

# Copy the data from the Cloud Login Server to Filestore through the Jump Server
scp -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key -r /sasviya/data/* \
   jumpuser@${JUMPIP}:/mnt/geldata/data/.

# Check the contents of the target
ssh -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key jumpuser@${JUMPIP} \
   "ls -al /mnt/geldata/data/"

 

Using GCS as a staging area:

# ssh to the Jump Server
ssh -o "StrictHostKeyChecking=no" -i ~/.ssh/gelgcpdm-gke-key jumpuser@${JUMPIP}

# Assuming the data to be copied is already available in a GCS bucket
# Copy the bucket contents to Filestore through the NFS mount
gsutil -m cp -r gs://myviya-gcpdm/data/* /mnt/geldata/data/.

# Check the contents of the target
ls -al /mnt/geldata/data/

# Exit ssh
exit

 

PS: this step assumes that the Google Cloud SDK (gsutil) has been installed beforehand on the Jump Server. All right, we now have a Google Filestore instance populated with some existing data.  

 

4 – Prepare the Kubernetes configuration files to mount the Filestore instance in the CAS and SAS Compute pods

 

We are ready to start modifying the Viya deployment to take into account our new data file server. First, get the Google Filestore instance IP address on the client:

 

NFSIP=`gcloud filestore instances list | grep geldata | awk '{print $6}'`
echo $NFSIP

 

Then create 3 overlays:

    • One for the mount specifications of the CAS pod (cas-add-filestore-mount.yaml)
---
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: cas-add-filestore-mount
patch: |-
    - op: add
      path: /spec/controllerTemplate/spec/volumes/-
      value:
        name: sas-filestore
        nfs:
          path: /vol/data
          server: $NFSIP
    - op: add
      path: /spec/controllerTemplate/spec/containers/0/volumeMounts/-
      value:
        name: sas-filestore
        mountPath: /filestore/data
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1
    • One for the mount specifications of the SAS Compute pod (compute-add-filestore-mount.yaml)
---
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: compute-add-filestore-mount
patch: |-
  - op: add
    path: /template/spec/volumes/-
    value:
      name: sas-filestore
      nfs:
        path: /vol/data
        server: $NFSIP
  - op: add
    path: /template/spec/containers/0/volumeMounts/-
    value:
      name: sas-filestore
      mountPath: /filestore/data
target:
  name: sas-compute-job-config
  version: v1
  kind: PodTemplate
    • One for adding the mount point to the CAS AllowList (cas-add-filestore-allowlist.yaml)
---
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: cas-add-filestore-allowlist
patch: |-
  - op: add
    path: /spec/appendCASAllowlistPaths/-
    value:
      /filestore/data
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1
EOF

 

Notice that:

  • The path to mount is /vol/data (“vol” has been defined during the Google Filestore creation – step1, “data” has been configured in step 2)
  • The $NFSIP must be resolved while creating those files
  • The target path on the pods will be /filestore/data
  • This path will be added in the CAS AllowList (so the path will be accessible by CAS)

 

5 – Reference the overlays in the kustomization.yaml file

 

Add the following lines to the transformers block of the kustomization.yaml file:

 

  - site-config/cas/cas-add-filestore-mount.yaml
  - site-config/sas-compute-server/compute-add-filestore-mount.yaml
  - site-config/cas/cas-add-filestore-allowlist.yaml

 

6 – Build and apply site.yaml

 

To apply the overlays, we need to build the Kubernetes manifest:

 

kustomize build -o site.yaml

 

And apply it:

 

kubectl -n gelgcp apply -f site.yaml

 

7 – Restart CAS

 

To take changes into account, we need to restart CAS:

 

kubectl -n gelgcp delete pods -l casoperator.sas.com/server=default

 

8 – Test the access to the Filestore instance from SAS Viya

 

It is time to test the access to the Google Filestore instance from SAS and see if we can access existing data, create new tables as well as share data between the SAS Compute Server and CAS:

 

78   /* SAS access to Filestore */
79   
80   libname sasfs "/filestore/data" ;
NOTE: Libref SASFS was successfully assigned as follows: 
      Engine:        V9 
      Physical Name: /filestore/data
81   
82   proc datasets lib=sasfs ;
                                                             Directory
                                                Libref             SASFS           
                                                Engine             V9              
                                                Physical Name      /filestore/data 
                                                Filename           /filestore/data 
                                                Inode Number       61079553        
                                                Access Permission  rwxrwxrwx       
                                                Owner Name         root            
                                                File Size          4KB             
                                                File Size (bytes)  4096            
                                              Member
                                     #  Name  Type       File Size  Last Modified
                                     1  HMEQ  DATA           704KB  07/02/2021 15:28:10        
83   quit ;
NOTE: PROCEDURE DATASETS used (Total process time):
      real time           0.02 seconds
      cpu time            0.03 seconds
      
84   
85   data sasfs.sas_prdsale ;
86   set sashelp.prdsale ;
87   run ;
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set SASFS.SAS_PRDSALE has 1440 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.02 seconds
      
88   
89   proc datasets lib=sasfs ;
                                                             Directory
                                                Libref             SASFS           
                                                Engine             V9              
                                                Physical Name      /filestore/data 
                                                Filename           /filestore/data 
                                                Inode Number       61079553        
                                                Access Permission  rwxrwxrwx       
                                                Owner Name         root            
                                                File Size          4KB             
                                                             Directory
                                                File Size (bytes)  4096            
                                                 Member
                                 #  Name         Type       File Size  Last Modified
                                 1  HMEQ         DATA           704KB  07/02/2021 15:28:10        
                                 2  SAS_PRDSALE  DATA           256KB  07/02/2021 19:01:35        
90   quit ;
NOTE: PROCEDURE DATASETS used (Total process time):
      real time           0.03 seconds
      cpu time            0.04 seconds
      
91   
92   /* CAS access to Filestore */
93   
94   cas mysession ;
NOTE: The session MYSESSION connected successfully to Cloud Analytic Services sas-cas-server-default-client using port 5570. The 
      UUID is 3b31c471-97b0-0a49-9d6b-06dec184f040. The user is Alex and the active caslib is CASUSER(Alex).
NOTE: The SAS option SESSREF was updated with the value MYSESSION.
NOTE: The SAS macro _SESSREF_ was updated with the value MYSESSION.
NOTE: The session is using 4 workers.
95   
96   caslib filestore type=path path="/filestore/data" libref=casfs ;
NOTE: 'FILESTORE' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'FILESTORE'.
NOTE: CASLIB FILESTORE for session MYSESSION will be mapped to SAS Library CASFS.
NOTE: Action to ADD caslib FILESTORE completed for session MYSESSION.
97   
98   proc casutil ;
NOTE: The UUID '3b31c471-97b0-0a49-9d6b-06dec184f040' is connected using session MYSESSION.
99   
99 !  list files ;
                                                        Caslib Information
                                             Library                  FILESTORE       
                                             Source Type              PATH            
                                             Path                     /filestore/data/
                                             Session local            Yes             
                                             Active                   Yes             
                                             Personal                 No              
                                             Hidden                   No              
                                             Transient                No              
                                                       CAS File Information
 
                                                                      Encryption                  Last Modified
              Name                    Permission    Owner    Group    Method         File Size    (UTC)
              hmeq.sas7bdat           -rw-rw-r--                                       704.0KB     02JUL2021:15:28:11
              customers.sashdat       -rw-rw-r--                      NONE              15.9KB     02JUL2021:15:28:11
              sas_prdsale.sas7bdat    -rw-r--r--                                       256.0KB     02JUL2021:19:01:35
              contact_list.csv        -rw-rw-r--                                         3.7KB     02JUL2021:15:28:11
NOTE: Cloud Analytic Services processed the combined requests in 0.016016 seconds.
100  quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
      real time           0.06 seconds
      cpu time            0.03 seconds
      
101  
102  data casfs.cas_prdsale ;
103  set sashelp.prdsale ;
104  run ;
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set CASFS.CAS_PRDSALE has 1440 observations and 10 variables.
NOTE: DATA statement used (Total process time):
      real time           0.08 seconds
      cpu time            0.06 seconds
      
105  
106  proc casutil ;
NOTE: The UUID '3b31c471-97b0-0a49-9d6b-06dec184f040' is connected using session MYSESSION.
107  
107!  save casdata="cas_prdsale" casout="cas_prdsale.sashdat" replace ;
NOTE: Cloud Analytic Services saved the file cas_prdsale.sashdat in caslib FILESTORE.
NOTE: The Cloud Analytic Services server processed the request in 0.042732 seconds.
108  
108!  list files ;
                                                        Caslib Information
                                             Library                  FILESTORE       
                                             Source Type              PATH            
                                             Path                     /filestore/data/
                                             Session local            Yes             
                                             Active                   Yes             
                                             Personal                 No              
                                             Hidden                   No              
                                             Transient                No              
                                                       CAS File Information
 
                                                                      Encryption                  Last Modified
              Name                    Permission    Owner    Group    Method         File Size    (UTC)
              hmeq.sas7bdat           -rw-rw-r--                                       704.0KB     02JUL2021:15:28:11
              customers.sashdat       -rw-rw-r--                      NONE              15.9KB     02JUL2021:15:28:11
              sas_prdsale.sas7bdat    -rw-r--r--                                       256.0KB     02JUL2021:19:01:35
              contact_list.csv        -rw-rw-r--                                         3.7KB     02JUL2021:15:28:11
              cas_prdsale.sashdat     -rwxr-xr-x    sas      sas      NONE             182.4KB     02JUL2021:19:01:36
NOTE: Cloud Analytic Services processed the combined requests in 0.019751 seconds.
109  quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
      real time           0.13 seconds
      cpu time            0.08 seconds
      
110  
111  cas mysession terminate ;
NOTE: Libref CASFS has been deassigned.
NOTE: Deletion of the session MYSESSION was successful.
NOTE: The default CAS session MYSESSION identified by SAS option SESSREF= was terminated. Use the OPTIONS statement to set the 
      SESSREF= option to an active session.
NOTE: Request to TERMINATE completed for session MYSESSION.
 
9 – Review what has been done
 
Like in the previous blog, we can use kubectl commands to understand what we have done so far. We can list the CAS container’s volume mounts:
kubectl -n gelgcp get pod sas-cas-server-default-controller \
   -o json | jq '.spec.containers[] | select(.name=="cas") | .volumeMounts[]'
 
And identify that /filestore/data path is managed by the sas-filestore volume mount:
 
{
  "mountPath": "/filestore/data",
  "name": "sas-filestore"
}
 
Then, we can list CAS pod’s volumes:
 
kubectl -n gelgcp get pod sas-cas-server-default-controller \
   -o json | jq '.spec.volumes[]'
 
And see the details of the NFS mount for the sas-filestore volume mount:
 
{
  "name": "sas-filestore",
  "nfs": {
    "path": "/vol/data",
    "server": "10.X.Y.42"
  }
}
 
Here we go. We have configured a cloud-native data file server, independent of a SAS deployment, and make it available to both SAS data processing engines (Compute and CAS). This can be used for all path-based data storage needed by SAS and can be easily pre-populated with existing data in a typical on-premise to cloud migration.  
 
Thanks for reading.

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎07-21-2021 09:47 AM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started