Using Azure Blob Storage for sas7bdat files with SAS Viya – part 2
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
In the 1st part of this series, I talked about having an Azure Storage Account (ADLS2) with NFS 3.0 protocol, Hierarchical namespace, and Azure Blob Storage CSI driver enabled on the AKS cluster, you can mount the Blob Storage to AKS Pods. You can create a Persistent Volume (PV) and Persistence Volume Claim (PVC) against Azure Blob Storage using the azureblob-nfs-premium storage class and mount it to the SAS Compute Pod and CAS Pods to access the sas7bdat data files.
In the second part of the series, I talk about the configuration for Azure Storage Account and AKS to NFS mount the Compute and CAS pod to Azure Blob storage.
Pre-requisite
- An Azure Storage Account with Hierarchical namespace enabled.
- Azure Storage Account with NFS 3.0 Protocol enabled.
- AKS cluster with Azure Blob Storage CSI driver enabled.
- AKS cluster with NFS 3.0 or above Protocol.
- Storage Account network access to AKS VNet and Subnet.
- Azure CLI Ver 2.48 and onwards.
The following picture describes the SAS Compute and CAS Pod mount to Azure Blob Storage.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Storage Account configuration
In this process, you can use either a Standard or Premium ADLS2 Storage Account. The Premium Storage account has better performance. When creating an Azure Storage Account, include the Hierarchical namespace and NFS 3.0 Protocol. Also, making it part of the AKS VNet and Subnet network, means it allows the network traffic from the AKS cluster. The ACL of the Blob Storage folder with read/write permission for the required user group and users.
The Azure Storage Account (ADLS2) with Hierarchical Name Space, NFS 3. 0 protocol, and selected network (AKS VNet and Subnet) access.
The Azure Storage Account (ADLS2) network access to AKS Vnet and subnet Network.
The ACL for specific Blob folders with read and write permission. Provision the required read/write permission for user groups and users to the Blob storage folder.
AKS Cluster configuration
The AKS cluster must support NFS 3.0 and above protocol with Azure Blob CSI drive installed.
If required, you can enable the Azure Blob CSI driver and storage class using the following statement. To execute the following update statement, you must have Azure CLI Ver 2.48 and above.
Code:
az account set --subscription "PSGEL286 SAS Viya 4: Data Management on Azure Cloud"
RG_NAME=mydata_rg
AKSCLUSTER=p03039-aks
echo $RG_NAME
echo $AKSCLUSTER
az aks update -g $RG_NAME -n $AKSCLUSTER --enable-blob-driver --load-balancer-managed-outbound-ip-count 2
Verify the AKS cluster for Azure Blob CSI driver and Storage class. Notice the azureblob-nfs-premium and azureblob-fuse-premium storage class in the list.
Code:
kubectl get sc
Log:
[cloud-user@pdcesx11142 scripts]$ kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION A
azureblob-fuse-premium blob.csi.azure.com Delete Immediate true 1
azureblob-nfs-premium blob.csi.azure.com Delete Immediate true 1
azurefile file.csi.azure.com Delete Immediate true 1
…
…….
…………..
[cloud-user@pdcesx11142 scripts]$
Create PV and PVC against Azure Blob Storage
Having azureblob-nfs-premium driver and storage class at the AKS cluster, you can create a Persistence Volume (PV) and Persistence Volume Claim (PVC) against the Azure Blob Storage using the azureblob-nfs-premium storage class.
If required, you can create a customized storage class as well using blob.csi.azure.com driver provider.
The following statement can be used to create a .yaml file for Blob PV for NFS mount. When creating multiple PVs make sure to have a unique attribute value for volumeHandle: unique-volumeid101 .
Code:
export NS=<mynamespace>
export STRG_RG=<mystrgrg>
export STRG_ACC=<mystrgacct>
export STRG_FILE=<myfsdata>
cat > ~/project/deploy/${NS}/site-config/data-access/pv-blob-nfs.yaml <<-EOF
apiVersion: v1
kind: PersistentVolume
metadata:
annotations:
pv.kubernetes.io/provisioned-by: blob.csi.azure.com
name: pv-azblob
spec:
capacity:
storage: 5G
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain # If set as "Delete" container would be removed after pvc deletion
storageClassName: azureblob-nfs-premium
csi:
driver: blob.csi.azure.com
readOnly: false
# make sure volumeid is unique for every identical storage blob container in the cluster
# character `#` is reserved for internal use and cannot be used in volumehandle
volumeHandle: unique-volumeid101
volumeAttributes:
resourceGroup: ${STRG_RG}
storageAccount: ${STRG_ACC}
containerName: ${STRG_FILE}
protocol: nfs
EOF
The following statement can be used to create a .yaml file for Blob PVC for NFS mount.
Code:
export NS=<mynamespace>
cat > ~/project/deploy/${NS}/site-config/data-access/pvc-blob-nfs.yaml <<-EOF
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvc-azblob
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 3Gi
volumeName: pv-azblob
storageClassName: azureblob-nfs-premium
EOF
The following statement can be used to create a Blob PV and PVC for NFS mount. Apply the .yaml file to the AKS cluster under specified namespace to get the Blob PV and PVC.
Code:
export NS=<mynamespace>
kubectl config set-context --current --namespace=${NS}
kubectl apply -f ~/project/deploy/${NS}/site-config/data-access/pv-blob-nfs.yaml
kubectl apply -f ~/project/deploy/${NS}/site-config/data-access/pvc-blob-nfs.yaml
kubectl -n ${NS} get pv pv-azblob
kubectl -n ${NS} get pvc pvc-azblob
Log:
[cloud-user@pdcesx03039 data-access]$ kubectl -n ${NS} get pv pv-azblob
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv-azblob 5G RWX Retain Bound gelenv/pvc-azblob azureblob-nfs-premium 3h28m
[cloud-user@pdcesx03039 data-access]$
[cloud-user@pdcesx03039 data-access]$ kubectl -n ${NS} get pvc pvc-azblob
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc-azblob Bound pv-azblob 5G RWX azureblob-nfs-premium 3h28m
[cloud-user@pdcesx03039 data-access]$
Mount the Azure Blob PVC to SAS Compute and CAS Pods
With Azure Blob PV and PVC at the AKS cluster, you can mount it to Compute and CAS Pods to store and access sas7bdat files.
The following statement can be used to create a .yaml file for Compute Pod to mount Blob PVC.
Code:
export NS=<mynamespace>
cat > ~/project/deploy/${NS}/site-config/data-access/blobdata-mounts-job.yaml <<-EOF
# General example for adding mounts to CAS workers
# PatchTransformer
apiVersion: builtin
kind: PatchTransformer
metadata:
name: blobdata-mounts-job
patch: |-
## Azure Blob container - kubernetes will mount these for you
- op: add
path: /template/spec/containers/0/volumeMounts/-
value:
name: blob01
mountPath: "/mnt/blob01"
- op: add
path: /template/spec/volumes/-
value:
name: blob01
persistentVolumeClaim:
claimName: pvc-azblob
target:
kind: PodTemplate
annotationSelector: sas.com/sas-access-config=true
labelSelector: "sas.com/template-intent=sas-launcher"
EOF
The following statement can be used to create a .yaml file for CAS Pod to mount Blob PVC.
Code:
export NS=<mynamespace>
cat > ~/project/deploy/${NS}/site-config/data-access/blobdata-mounts-cas.yaml <<-EOF
# General example for adding mounts to CAS workers
# PatchTransformer
apiVersion: builtin
kind: PatchTransformer
metadata:
name: blobdata-mounts-cas
patch: |-
## Azure File Share - kubernetes will mount these for you
- op: add
path: /spec/controllerTemplate/spec/containers/0/volumeMounts/-
value:
name: blob01
mountPath: "/mnt/blob01"
- op: add
path: /spec/controllerTemplate/spec/volumes/-
value:
name: blob01
persistentVolumeClaim:
claimName: pvc-azblob
target:
kind: CASDeployment
annotationSelector: sas.com/sas-access-config=true
EOF
The following statement can be used to create a .yaml file for CAS Pod to allow the access to mounted path.
Code:
export NS=<mynamespace>
cat > ~/project/deploy/${NS}/site-config/data-access/cas-add-allowlist-paths.yaml << EOF
apiVersion: builtin
kind: PatchTransformer
metadata:
name: cas-add-allowlist-paths
patch: |-
- op: add
path: /spec/appendCASAllowlistPaths/-
value:
/mnt/blob01
target:
group: viya.sas.com
kind: CASDeployment
name: .*
version: v1alpha1
EOF
Update the kustomization.yaml to include the additional .yaml files under the “transformers” section.
Code:
transformers:
……….
……..
- site-config/data-access/blobdata-mounts-cas.yaml ## To mount Azure BLOB PVC to CAS
- site-config/data-access/blobdata-mounts-job.yaml ## To mount Azure BLOB PVC to Compute Server
- site-config/data-access/cas-add-allowlist-paths.yaml ## To Allow Path in CAS
……
…..
Build and apply the manifestation to the AKS cluster and recycle the CAS and SAS Compute pod.
Code:
export NS=<mynamespace>
cd ~/project/deploy/${NS}/
kustomize build -o site.yaml
kubectl -n ${NS} apply -f site.yaml
kubectl -n ${NS} delete pods -l casoperator.sas.com/server=default
kubectl -n ${NS} delete pod --selector='app=sas-compute'
kubectl -n ${NS} delete pod --selector='app=sas-launcher'
When CAS Pods are running, validate the Blob Storage is mounted to CAS Pods.
Code:
##validate the disk is correctly mounted
kubectl exec sas-cas-server-default-controller -c sas-cas-server -- touch /mnt/blob01/sample_data/test2.txt
kubectl exec sas-cas-server-default-controller -- ls -l /mnt/blob01/sample_data
Log:
[cloud-user@pdcesx03039 gelenv]$ kubectl exec sas-cas-server-default-controller -c sas-cas-server -- touch /mnt/blob01/data/test2.txt
[cloud-user@pdcesx03039 gelenv]$
[cloud-user@pdcesx03039 gelenv]$ kubectl exec sas-cas-server-default-controller -- ls /mnt/blob01/sample_data
Defaulted container "sas-cas-server" out of: sas-cas-server, sas-backup-agent, sas-consul-agent, sas-certframe (init), sas-config-init (init)
test2.txt
[cloud-user@pdcesx03039 gelenv]$
If the SAS Compute Server is in a lockdown state, update the Compute service:autoexec_code configuration instance from the SAS Environment Manager application to include the required path.
SAS Compute Server Access to Azure Blob Storage PVC
With Azure Blob PVC mounted to the SAS Compute Server, it can read and write sas7bdat files to Azure Blob Storage (ADLS2). The Azure Blob Storage is available to the SAS Compute Server using NFS 3 protocol and Azure Blob CSI driver. The following statement can be used to save and read a sas7bdat file to Azure Blob Storage using a PATH-based LIBNAME statement.
Code:
/* # LIBNAME Statement for Azure Blob Share Location */
libname azshrlib "/mnt/blob01/sample_data" ;
data azshrlib.fish_sas ;
set sashelp.fish ;
run;
Proc SQL outobs=20;
select * from azshrlib.fish_sas ;
run;quit;
Log:
80 /* # LIBNAME Statement for Azure Blob Share Location */
81
82 libname azshrlib "/mnt/blob01/sample_data" ;
NOTE: Libref AZSHRLIB was successfully assigned as follows:
Engine: V9
Physical Name: /mnt/blob01/sample_data
83
84 data azshrlib.fish_sas ;
85 set sashelp.fish ;
86 run;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set AZSHRLIB.FISH_SAS has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.46 seconds
cpu time 0.00 seconds
87
88 Proc SQL outobs=20;
89 select * from azshrlib.fish_sas ;
WARNING: Statement terminated early due to OUTOBS=20 option.
90 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 11.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.06 seconds
cpu time 0.04 seconds
91
CAS load/Save from Azure Blob Storage PVC
With Azure Blob PVC mounted to the CAS, it can load/save data from sas7bdat files located at Azure Blob Storage (ADLS2). The Azure Blob Storage is available to CAS using NFS 3 protocol and Azure Blob CSI driver. CAS can access the sas7bdat data files saved by the SAS Compute Server and vice versa.
The following statement can be used to load/save a sas7bdat file to Azure Blob Storage using a PATH-based CASLIB statement.
Code:
CAS mySession SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
CASLIB azlib DATASOURCE=(SRCTYPE="PATH") path="/mnt/blob01/sample_data" ;
/* Save a CAS table into a sas7bdat file. */
proc casutil incaslib="azlib" outcaslib="azlib";
load data=sashelp.cars casout="cars" replace;
save casdata="cars" casout="cars.sas7bdat" replace;
list files;
quit;
/* Load CAS from sas7bdat data files*/
proc casutil incaslib="azlib" outcaslib="azlib";
load casdata="cars.sas7bdat" casout="cars_7bdat" replace;
load casdata="fish_sas.sas7bdat" casout="fish_7bdat" replace;
list tables;
quit;
CAS mySession TERMINATE;
Log:
…….
…..
82 CASLIB azlib DATASOURCE=(SRCTYPE="PATH") path="/mnt/blob01/sample_data" ;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AZLIB' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AZLIB'.
NOTE: Action 'table.addCaslib' used (Total process time):
NOTE: real time 0.039090 seconds
NOTE: cpu time 0.017151 seconds (43.88%)
NOTE: total nodes 3 (12 cores)
NOTE: total memory 94.03G
NOTE: memory 1.41M (0.00%)
NOTE: Action to ADD caslib AZLIB completed for session MYSESSION.
83
84 /* Save a CAS table into a sas7bdat file. */
85 proc casutil incaslib="azlib" outcaslib="azlib";
NOTE: The UUID 'dde2b86d-583b-e842-8736-8fb4390b584a' is connected using session MYSESSION.
….
….
88 save casdata="cars" casout="cars.sas7bdat" replace;
NOTE: Executing action 'table.save'.
NOTE: Cloud Analytic Services saved the file cars.sas7bdat in caslib AZLIB.
NOTE: Action 'table.save' used (Total process time):
NOTE: real time 0.216190 seconds
NOTE: cpu time 0.041189 seconds (19.05%)
NOTE: total nodes 3 (12 cores)
NOTE: total memory 94.03G
NOTE: memory 4.52M (0.00%)
NOTE: The Cloud Analytic Services server processed the request in 0.21619 seconds.
….
…….
92 /* Load CAS from sas7bdat table */
93 proc casutil incaslib="azlib" outcaslib="azlib";
NOTE: The UUID 'dde2b86d-583b-e842-8736-8fb4390b584a' is connected using session MYSESSION.
95 load casdata="cars.sas7bdat" casout="cars_7bdat" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file cars.sas7bdat available as table CARS_7BDAT in caslib azlib.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE: real time 0.071273 seconds
NOTE: cpu time 0.039901 seconds (55.98%)
NOTE: total nodes 3 (12 cores)
NOTE: total memory 94.03G
NOTE: memory 7.08M (0.01%)
NOTE: bytes moved 68.08K
NOTE: The Cloud Analytic Services server processed the request in 0.071273 seconds.
100 load casdata="fish_sas.sas7bdat" casout="fish_7bdat" replace;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file fish_sas.sas7bdat available as table FISH_7BDAT in caslib azlib.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE: real time 0.052144 seconds
NOTE: cpu time 0.042362 seconds (81.24%)
NOTE: total nodes 3 (12 cores)
NOTE: total memory 94.03G
NOTE: memory 7.06M (0.01%)
NOTE: bytes moved 11.14K
NOTE: The Cloud Analytic Services server processed the request in 0.052144 seconds.
…….
……………
…………….
Azure Blob Storage with sas7bdat files
Performance
To be practical, the performance of mounted blob storage is not going to be like a local disk or local NFS drive. You must be prepared for slow performance. The performance of the data file access from Azure Blob Storage to SAS Compute Server and CAS environment depends on various components, including the capacity of the VM Server used in the AKS Node-pool. The usage of Standard Vs Premium Storage Account. The network interface and I/O throughput of VM machines. The high-end Azure VM in the AKS Node-pool with the Premium Storage Account would perform better.
The following test result is from a Standard and Premium Storage Account with the “Standard_E4s_v3” type VM machine for CAS and Standard_D8s_v3 type VM machine for the SAS Compute Server.
SAS Compute Server time to save sas7bdat data to ADLS2 Blob Storage
SAS Compute Server time to read sas7bdat data (Proc Freq) from ADLS2 Blob Storage
CAS time to load sas7bdat data file from ADLS2 Blob Storage
Conclusion
Considering the features and limitations of Azure Storage Account Blob Storage and VM in AKS Node-pool, you can mount the Blob storage to SAS Compute Server and CAS pods to access the sas7bdat files. This could be a useful option to migrate the existing sas7bdat files from the on-premises environment to cloud-based object storage (Blob). It enables SAS users to read and write sas7bdat files to Blob storage from the SAS Compute Server and CAS. The sas7bdat data files created by either SAS Compute Server or CAS can be read by both and vice versa.
Important Link:
Using Azure Blob Storage for sas7bdat files with SAS Viya – Part 1
Find more articles from SAS Global Enablement and Learning here.