SAS Communities Library

We’re smarter together. Learn from this collection of community knowledge and add your expertise.
BookmarkSubscribeRSS Feed

Using Azure Blob Storage for sas7bdat files with SAS Viya – part 2

Started ‎12-08-2023 by
Modified ‎12-08-2023 by
Views 2,064

In the 1st part of this series, I talked about having an Azure Storage Account (ADLS2) with NFS 3.0 protocol, Hierarchical namespace, and Azure Blob Storage CSI driver enabled on the AKS cluster, you can mount the Blob Storage to AKS Pods. You can create a Persistent Volume (PV) and Persistence Volume Claim (PVC) against Azure Blob Storage using the azureblob-nfs-premium storage class and mount it to the SAS Compute Pod and CAS Pods to access the sas7bdat data files.

 

In the second part of the series, I talk about the configuration for Azure Storage Account and AKS to NFS mount the Compute and CAS pod to Azure Blob storage.

 

 

Pre-requisite

 

  • An Azure Storage Account with Hierarchical namespace enabled.
  • Azure Storage Account with NFS 3.0 Protocol enabled.
  • AKS cluster with Azure Blob Storage CSI driver enabled.
  • AKS cluster with NFS 3.0 or above Protocol.
  • Storage Account network access to AKS VNet and Subnet.
  • Azure CLI Ver 2.48 and onwards.

 

The following picture describes the SAS Compute and CAS Pod mount to Azure Blob Storage.

 

01_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_1.png

 Select any image to see a larger version.

Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 

Storage Account configuration

 

In this process, you can use either a Standard or Premium ADLS2 Storage Account. The Premium Storage account has better performance. When creating an Azure Storage Account, include the Hierarchical namespace and NFS 3.0 Protocol. Also, making it part of the AKS VNet and Subnet network, means it allows the network traffic from the AKS cluster. The ACL of the Blob Storage folder with read/write permission for the required user group and users.

 

The Azure Storage Account (ADLS2) with Hierarchical Name Space, NFS 3. 0 protocol, and selected network (AKS VNet and Subnet) access.

 

Border_02_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_2.png

 

The Azure Storage Account (ADLS2) network access to AKS Vnet and subnet Network.

 

Border_03_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_3.png

 

The ACL for specific Blob folders with read and write permission. Provision the required read/write permission for user groups and users to the Blob storage folder.

 

Border_04_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_4.png

 

 

AKS Cluster configuration

 

The AKS cluster must support NFS 3.0 and above protocol with Azure Blob CSI drive installed.

 

If required, you can enable the Azure Blob CSI driver and storage class using the following statement. To execute the following update statement, you must have Azure CLI Ver 2.48 and above.

 

Code:

az account set --subscription "PSGEL286 SAS Viya 4: Data Management on Azure Cloud"

RG_NAME=mydata_rg
AKSCLUSTER=p03039-aks
echo $RG_NAME
echo $AKSCLUSTER

az aks update  -g $RG_NAME  -n $AKSCLUSTER   --enable-blob-driver --load-balancer-managed-outbound-ip-count 2

 

Verify the AKS cluster for Azure Blob CSI driver and Storage class. Notice the azureblob-nfs-premium and azureblob-fuse-premium storage class in the list.

 

Code:

kubectl get sc

 

Log:

[cloud-user@pdcesx11142 scripts]$ kubectl get sc
NAME                     PROVISIONER                                     RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   A
azureblob-fuse-premium   blob.csi.azure.com                              Delete          Immediate              true                   1
azureblob-nfs-premium    blob.csi.azure.com                              Delete          Immediate              true                   1
azurefile                file.csi.azure.com                              Delete          Immediate              true                   1
…
…….
…………..
[cloud-user@pdcesx11142 scripts]$

 

 

Create PV and PVC against Azure Blob Storage

 

Having azureblob-nfs-premium driver and storage class at the AKS cluster, you can create a Persistence Volume (PV) and Persistence Volume Claim (PVC) against the Azure Blob Storage using the azureblob-nfs-premium storage class.

 

If required, you can create a customized storage class as well using blob.csi.azure.com driver provider.

 

The following statement can be used to create a .yaml file for Blob PV for NFS mount. When creating multiple PVs make sure to have a unique attribute value for volumeHandle: unique-volumeid101 .

 

Code:

export NS=<mynamespace> 
export STRG_RG=<mystrgrg>   
export STRG_ACC=<mystrgacct>  
export STRG_FILE=<myfsdata> 

cat > ~/project/deploy/${NS}/site-config/data-access/pv-blob-nfs.yaml <<-EOF

apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: blob.csi.azure.com
  name: pv-azblob
spec:
  capacity:
    storage: 5G
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain  # If set as "Delete" container would be removed after pvc deletion
  storageClassName: azureblob-nfs-premium
  csi:
    driver: blob.csi.azure.com
    readOnly: false
    # make sure volumeid is unique for every identical storage blob container in the cluster
    # character `#` is reserved for internal use and cannot be used in volumehandle
    volumeHandle: unique-volumeid101
    volumeAttributes:
      resourceGroup: ${STRG_RG}
      storageAccount: ${STRG_ACC}
      containerName: ${STRG_FILE}
      protocol: nfs

EOF

 

The following statement can be used to create a .yaml file for Blob PVC for NFS mount.

 

Code:

export NS=<mynamespace> 

cat > ~/project/deploy/${NS}/site-config/data-access/pvc-blob-nfs.yaml <<-EOF

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: pvc-azblob
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 3Gi
  volumeName: pv-azblob
  storageClassName: azureblob-nfs-premium

EOF

 

The following statement can be used to create a Blob PV and PVC for NFS mount. Apply the .yaml file to the AKS cluster under specified namespace to get the Blob PV and PVC.

 

Code:

export NS=<mynamespace>  

kubectl config set-context --current --namespace=${NS}

kubectl apply -f ~/project/deploy/${NS}/site-config/data-access/pv-blob-nfs.yaml
kubectl apply -f ~/project/deploy/${NS}/site-config/data-access/pvc-blob-nfs.yaml

kubectl -n ${NS} get pv pv-azblob
kubectl -n ${NS} get pvc pvc-azblob

 

Log:

[cloud-user@pdcesx03039 data-access]$ kubectl -n ${NS} get pv pv-azblob
NAME        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM               STORAGECLASS            REASON   AGE
pv-azblob   5G         RWX            Retain           Bound    gelenv/pvc-azblob   azureblob-nfs-premium            3h28m
[cloud-user@pdcesx03039 data-access]$


[cloud-user@pdcesx03039 data-access]$ kubectl -n ${NS} get pvc pvc-azblob
NAME         STATUS   VOLUME      CAPACITY   ACCESS MODES   STORAGECLASS            AGE
pvc-azblob   Bound    pv-azblob   5G         RWX            azureblob-nfs-premium   3h28m
[cloud-user@pdcesx03039 data-access]$

 

 

Mount the Azure Blob PVC to SAS Compute and CAS Pods

 

With Azure Blob PV and PVC at the AKS cluster, you can mount it to Compute and CAS Pods to store and access sas7bdat files.

 

The following statement can be used to create a .yaml file for Compute Pod to mount Blob PVC.

 

Code:

export NS=<mynamespace> 

cat > ~/project/deploy/${NS}/site-config/data-access/blobdata-mounts-job.yaml <<-EOF

# General example for adding mounts to CAS workers
# PatchTransformer
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: blobdata-mounts-job
patch: |-
  ## Azure Blob container - kubernetes will mount these for you
  - op: add
    path: /template/spec/containers/0/volumeMounts/-
    value:
      name: blob01
      mountPath: "/mnt/blob01"
  - op: add
    path: /template/spec/volumes/-
    value:
     name: blob01
     persistentVolumeClaim:
      claimName: pvc-azblob
target:
  kind: PodTemplate
  annotationSelector: sas.com/sas-access-config=true
  labelSelector: "sas.com/template-intent=sas-launcher"

EOF

 

The following statement can be used to create a .yaml file for CAS Pod to mount Blob PVC.

 

Code:

export NS=<mynamespace>  

cat > ~/project/deploy/${NS}/site-config/data-access/blobdata-mounts-cas.yaml <<-EOF

# General example for adding mounts to CAS workers
# PatchTransformer
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: blobdata-mounts-cas
patch: |-
  ## Azure File Share - kubernetes will mount these for you
  - op: add
    path: /spec/controllerTemplate/spec/containers/0/volumeMounts/-
    value:
      name: blob01
      mountPath: "/mnt/blob01"
  - op: add
    path: /spec/controllerTemplate/spec/volumes/-
    value:
     name: blob01
     persistentVolumeClaim:
      claimName: pvc-azblob
target:
  kind: CASDeployment
  annotationSelector: sas.com/sas-access-config=true

EOF

 

The following statement can be used to create a .yaml file for CAS Pod to allow the access to mounted path.

 

Code:

export NS=<mynamespace>  

cat > ~/project/deploy/${NS}/site-config/data-access/cas-add-allowlist-paths.yaml << EOF
apiVersion: builtin
kind: PatchTransformer
metadata:
  name: cas-add-allowlist-paths
patch: |-
  - op: add
    path: /spec/appendCASAllowlistPaths/-
    value:
      /mnt/blob01
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1
EOF

 

Update the kustomization.yaml to include the additional .yaml files under the “transformers” section.

 

Code:

transformers:
  ……….
  ……..
  - site-config/data-access/blobdata-mounts-cas.yaml   ## To mount Azure BLOB PVC to CAS
  - site-config/data-access/blobdata-mounts-job.yaml   ## To mount Azure BLOB PVC to Compute Server
  - site-config/data-access/cas-add-allowlist-paths.yaml   ## To Allow Path in CAS
  ……
  …..

 

Build and apply the manifestation to the AKS cluster and recycle the CAS and SAS Compute pod.

 

Code:

export NS=<mynamespace>  

cd ~/project/deploy/${NS}/
kustomize build -o site.yaml
kubectl -n ${NS} apply -f site.yaml

kubectl -n ${NS} delete pods -l casoperator.sas.com/server=default
kubectl -n ${NS} delete pod --selector='app=sas-compute'
kubectl -n ${NS} delete pod --selector='app=sas-launcher'

 

When CAS Pods are running, validate the Blob Storage is mounted to CAS Pods.

 

Code:

##validate the disk is correctly mounted

kubectl exec sas-cas-server-default-controller -c sas-cas-server  -- touch /mnt/blob01/sample_data/test2.txt

kubectl exec sas-cas-server-default-controller -- ls -l /mnt/blob01/sample_data

 

Log:

[cloud-user@pdcesx03039 gelenv]$ kubectl exec sas-cas-server-default-controller -c sas-cas-server  -- touch /mnt/blob01/data/test2.txt
[cloud-user@pdcesx03039 gelenv]$


[cloud-user@pdcesx03039 gelenv]$ kubectl exec sas-cas-server-default-controller -- ls /mnt/blob01/sample_data
Defaulted container "sas-cas-server" out of: sas-cas-server, sas-backup-agent, sas-consul-agent, sas-certframe (init), sas-config-init (init)

test2.txt
[cloud-user@pdcesx03039 gelenv]$

 

If the SAS Compute Server is in a lockdown state, update the Compute service:autoexec_code configuration instance from the SAS Environment Manager application to include the required path.

 

05_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_5.png

 

 

SAS Compute Server Access to Azure Blob Storage PVC

 

With Azure Blob PVC mounted to the SAS Compute Server, it can read and write sas7bdat files to Azure Blob Storage (ADLS2). The Azure Blob Storage is available to the SAS Compute Server using NFS 3 protocol and Azure Blob CSI driver. The following statement can be used to save and read a sas7bdat file to Azure Blob Storage using a PATH-based LIBNAME statement.

 

Code:

/* # LIBNAME Statement for Azure Blob Share Location */

libname azshrlib "/mnt/blob01/sample_data" ;

data azshrlib.fish_sas ;
set sashelp.fish ;
run;

Proc SQL outobs=20;
select * from azshrlib.fish_sas ;
run;quit;

 

Log:

80   /* # LIBNAME Statement for Azure Blob Share Location */
81
82   libname azshrlib "/mnt/blob01/sample_data" ;
NOTE: Libref AZSHRLIB was successfully assigned as follows:
      Engine:        V9
      Physical Name: /mnt/blob01/sample_data
83



84   data azshrlib.fish_sas ;
85   set sashelp.fish ;
86   run;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set AZSHRLIB.FISH_SAS has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.46 seconds
      cpu time            0.00 seconds

87


88   Proc SQL outobs=20;
89   select * from azshrlib.fish_sas ;
WARNING: Statement terminated early due to OUTOBS=20 option.
90   run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 11.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.06 seconds
      cpu time            0.04 seconds

91

 

 

CAS load/Save from Azure Blob Storage PVC

 

With Azure Blob PVC mounted to the CAS, it can load/save data from sas7bdat files located at Azure Blob Storage (ADLS2). The Azure Blob Storage is available to CAS using NFS 3 protocol and Azure Blob CSI driver. CAS can access the sas7bdat data files saved by the SAS Compute Server and vice versa.

 

The following statement can be used to load/save a sas7bdat file to Azure Blob Storage using a PATH-based CASLIB statement.

 

Code:

CAS mySession  SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

CASLIB azlib DATASOURCE=(SRCTYPE="PATH") path="/mnt/blob01/sample_data" ;

/* Save a CAS table into a sas7bdat  file. */
proc casutil incaslib="azlib" outcaslib="azlib";
load data=sashelp.cars casout="cars" replace;
save casdata="cars" casout="cars.sas7bdat"   replace;
list files;
quit;

/* Load CAS from sas7bdat data files*/
proc casutil incaslib="azlib" outcaslib="azlib";
load casdata="cars.sas7bdat" casout="cars_7bdat" replace;
load casdata="fish_sas.sas7bdat" casout="fish_7bdat" replace;
list tables;
quit;

CAS mySession  TERMINATE;

 

Log:

…….
…..
82   CASLIB azlib DATASOURCE=(SRCTYPE="PATH") path="/mnt/blob01/sample_data" ;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AZLIB' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AZLIB'.
NOTE: Action 'table.addCaslib' used (Total process time):
NOTE:       real time               0.039090 seconds
NOTE:       cpu time                0.017151 seconds (43.88%)
NOTE:       total nodes             3 (12 cores)
NOTE:       total memory            94.03G
NOTE:       memory                  1.41M (0.00%)
NOTE: Action to ADD caslib AZLIB completed for session MYSESSION.
83


84   /* Save a CAS table into a sas7bdat  file. */
85   proc casutil incaslib="azlib" outcaslib="azlib";
NOTE: The UUID 'dde2b86d-583b-e842-8736-8fb4390b584a' is connected using session MYSESSION.
….
….

88   save casdata="cars" casout="cars.sas7bdat"   replace;
NOTE: Executing action 'table.save'.
NOTE: Cloud Analytic Services saved the file cars.sas7bdat in caslib AZLIB.
NOTE: Action 'table.save' used (Total process time):
NOTE:       real time               0.216190 seconds
NOTE:       cpu time                0.041189 seconds (19.05%)
NOTE:       total nodes             3 (12 cores)
NOTE:       total memory            94.03G
NOTE:       memory                  4.52M (0.00%)
NOTE: The Cloud Analytic Services server processed the request in 0.21619 seconds.

….
…….
92   /* Load CAS from sas7bdat table */
93   proc casutil incaslib="azlib" outcaslib="azlib";
NOTE: The UUID 'dde2b86d-583b-e842-8736-8fb4390b584a' is connected using session MYSESSION.

95   load casdata="cars.sas7bdat" casout="cars_7bdat" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file cars.sas7bdat available as table CARS_7BDAT in caslib azlib.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE:       real time               0.071273 seconds
NOTE:       cpu time                0.039901 seconds (55.98%)
NOTE:       total nodes             3 (12 cores)
NOTE:       total memory            94.03G
NOTE:       memory                  7.08M (0.01%)
NOTE:       bytes moved             68.08K
NOTE: The Cloud Analytic Services server processed the request in 0.071273 seconds.


100  load casdata="fish_sas.sas7bdat" casout="fish_7bdat" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file fish_sas.sas7bdat available as table FISH_7BDAT in caslib azlib.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE:       real time               0.052144 seconds
NOTE:       cpu time                0.042362 seconds (81.24%)
NOTE:       total nodes             3 (12 cores)
NOTE:       total memory            94.03G
NOTE:       memory                  7.06M (0.01%)
NOTE:       bytes moved             11.14K
NOTE: The Cloud Analytic Services server processed the request in 0.052144 seconds.
…….
……………
…………….

 

 

Azure Blob Storage with sas7bdat files

 

Border_06_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_6.png

 

 

Performance

 

To be practical, the performance of mounted blob storage is not going to be like a local disk or local NFS drive. You must be prepared for slow performance. The performance of the data file access from Azure Blob Storage to SAS Compute Server and CAS environment depends on various components, including the capacity of the VM Server used in the AKS Node-pool. The usage of Standard Vs Premium Storage Account. The network interface and I/O throughput of VM machines. The high-end Azure VM in the AKS Node-pool with the Premium Storage Account would perform better.

 

The following test result is from a Standard and Premium Storage Account with the “Standard_E4s_v3” type VM machine for CAS and Standard_D8s_v3 type VM machine for the SAS Compute Server.

 

SAS Compute Server time to save sas7bdat data to ADLS2 Blob Storage

 

07_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_7.png

 

SAS Compute Server time to read sas7bdat data (Proc Freq) from ADLS2 Blob Storage

 

08_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_8.png

 

CAS time to load sas7bdat data file from ADLS2 Blob Storage

 

09_UK_SAS7bdat_Files_At_ASDLS2_Blob_Storage_p2_9.png

 

 

Conclusion

 

Considering the features and limitations of Azure Storage Account Blob Storage and VM in AKS Node-pool, you can mount the Blob storage to SAS Compute Server and CAS pods to access the sas7bdat files. This could be a useful option to migrate the existing sas7bdat files from the on-premises environment to cloud-based object storage (Blob). It enables SAS users to read and write sas7bdat files to Blob storage from the SAS Compute Server and CAS. The sas7bdat data files created by either SAS Compute Server or CAS can be read by both and vice versa.

 

Important Link:

Using Azure Blob Storage for sas7bdat files with SAS Viya – Part 1    

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎12-08-2023 01:37 PM
Updated by:
Contributors

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Labels
Article Tags