Azure Blobfuse to access Blob Storage
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Blobfuse is an open-source project developed to provide a virtual filesystem backed by Azure Blob storage. It is a virtual filesystem driver for Azure Blob storage. Blobfuse allows you to access blob data from Azure Storage Account through Linux Filesystem.
With all excitement around SAS and Azure Cloud, Blobfuse could be a useful tool to access SAS data sets stored at Azure Blob Storage. Azure Blob Storage is a cost-effective and reliable service to store data.
As a SAS user, you may use Azure Blob Storage to store all kinds of files (type) including “.sas7bdat” and “.sashdat” files. But there are no LIBNAME engine or CASLIB connector to directly read and write “.sas7bdat” and “.sashdat” files to Azure Blob Storage. With SAS Viya 3.5 release, SAS SPRE supports the ORC LIBNAME engine for the ORC data file at Blob Storage and ADLS FILENAME statement for other file types. CAS supports ORC and CSV data file access at Azure Blob Storage using ADLS CASLIB.
The Azure Blobfuse could be a viable option for SAS users migrating SAS datasets (.sas7bdat files) to Azure Blob Storage. By using Blobfuse, a SAS user can NFS mount the Azure Blob Storage location to a Unix server as an additional filesystem. The Unix server which is hosting SAS Compute server or CAS servers. The Blobfuse NFS mount enables SAS users to use SAS LIBNAME name statement or PATH based CASLIB to access the .sas7bdat and .sashdta datafiles.
The following diagram describes the data access path from Azure Blob Storage to SAS Compute Server and CAS Servers using Blobfuse.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
How to mount Blob Storage as a filesystem at Unix server
- Install the Blobfuse software.
sudo rpm -Uvh https://packages.microsoft.com/config/rhel/7/packages-microsoft-prod.rpm sudo yum install blobfuse
- Prepare the Unix OS for NFS mount.
sudo mkdir /mnt/blobfusetmp -p sudo chown utkuma:sasusers /mnt/blobfusetmp
- Configure Storage Account Credentials at Unix Server.
tee ~/fuse_connection.cfg > /dev/null << "EOF" accountName utkuma3adls2strg accountKey 3R4oxwqyqrTqb4e4v7jsI2viFPkouln9qwNAXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX containerName fsutkuma3adls2strg EOF sudo chmod 600 ~/fuse_connection.cfg
- Mount an empty directory to Azure Blob Storage.
sudo mkdir /opt/fscontainer sudo chown utkuma:sasusers /opt/fscontainer sudo blobfuse /opt/fscontainer --tmp-path=/mnt/blobfusetmp --config-file=/home/utkuma/fuse_connection.cfg -o allow_other -o attr_timeout=240 -o entry_timeout=240 -o negative_timeout=120
[utkuma@intviya01 root]$ ls -l /opt/fscontainer total 0 -rwxrwxrwx. 1 root root 11671314432 Jul 31 14:01 dm_fact_mega_corp_10g.sas7bdat -rwxrwxrwx. 1 root root 1167204352 Aug 4 16:34 dm_fact_mega_corp_1g_1.sas7bdat -rwxrwxrwx. 1 root root 1167204352 Jul 31 11:34 dm_fact_mega_corp_1g.sas7bdat -rwxrwxrwx. 1 root root 1221188352 Aug 4 16:56 dm_fact_mega_corp_1G.sashdat -rwxrwxrwx. 1 root root 2334334976 Aug 4 16:35 dm_fact_mega_corp_2g_1.sas7bdat -rwxrwxrwx. 1 root root 2334334976 Jul 31 11:35 dm_fact_mega_corp_2g.sas7bdat -rwxrwxrwx. 1 root root 2442365904 Aug 4 16:57 dm_fact_mega_corp_2G.sashdat -rwxrwxrwx. 1 root root 5835661312 Aug 4 16:38 dm_fact_mega_corp_5g_1.sas7bdat -rwxrwxrwx. 1 root root 5835661312 Jul 31 11:36 dm_fact_mega_corp_5g.sas7bdat -rwxrwxrwx. 1 root root 6105880216 Aug 4 17:02 dm_fact_mega_corp_5G.sashdat -rwxrwxrwx. 1 root root 949092352 Jul 30 16:12 dm_fact_mega_corp.sas7bdat -rwxrwxrwx. 1 root root 131072 Jul 30 16:06 fish_sas.sas7bdat drwxrwxrwx. 2 root root 4096 Dec 31 1969 sample_data [utkuma@intviya01 root]$
Azure Blob data file access from SAS and CAS
- SAS LIBNAME to access Blob storage data.
libname azshrlib "/opt/fscontainer" ; Proc SQL outobs=20; select * from azshrlib.fish_sas ; run;quit;
[utkuma@intviya01 ~]$ ls -l /mnt/blobfusetmp/root/ total 128 -rwxrwxrwx. 1 root root 131072 Jul 30 16:06 fish_sas.sas7bdat [utkuma@intviya01 ~]$
- PATH CASLIB to access Blob Storage data. When a Blob Storage location is NFS mounted to the CAS Controller server (Unix), users can use PATH based CASLIB to access .sas7bdat and .sashdat files.
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true); caslib azshcaslib datasource=(srctype="path") path="/opt/fscontainer" ; proc casutil outcaslib="azshcaslib" incaslib="azshcaslib" ; load casdata="dm_fact_mega_corp.sas7bdat" casout="dm_fact_mega_corp" replace; load casdata="dm_fact_mega_corp_1G.sashdat" casout="dm_fact_mega_corp_H" replace; list tables; quit; CAS mySession TERMINATE;
[root@intcas01 ~]# ls -l /mnt/blobfusetmp/root total 2250436 -rwxrwxrwx. 1 root root 1221188352 Aug 4 16:56 dm_fact_mega_corp_1G.sashdat -rwxrwxrwx. 1 root root 949092352 Jul 30 16:12 dm_fact_mega_corp.sas7bdat [root@intcas01 ~]#
- It is recommended to allows multiple CAS nodes to mount the same blob container for read-only scenarios.
- While a Blob container is mounted, the data in the container should not be modified by any process other than Blobfuse. This includes other instances of Blobfuse, running on this or other machines. Doing so could cause data loss or data corruption. Mounting other containers is fine.
CAS load/save Performance using Blobfuse.
The performance of CAS load from Blobfuse depends on the location of the CAS Unix server, and Azure Blob Storage account. For better data transfer between CAS server and Blob Storage, keep them close to each other as location-wise.- For better performance user should use latest series of Azure server (e.g. E32sV3 or E32ds_V4 ).
- Since I/O goes through the network, it’s recommended to enable the Accelerated Networking on Azure VM (for Azure supported Instance type).
Resource
How to Mount Azure Blob Storage Container to a Unix Server
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
@UttamKumar thank you for the share, an easy solution.
I am wondering, how will this blob storage work in terms of performance, more in detail, and if there is any caviat when used as shared storage i.e. SAS Grid Manager or SAS Viya's CAS.
I have some questions, as I am wondering from your experiences:
Does it provide the performance needed of min 100 MB/sec/core? Or, from another perspective, what can be the maximum I/O Throughput we can expect? No issues of file locking?
Can it be used only for normal data files, or also for SASWORK/UTILLOC?
Thank you in advance,
Best regards,
Juan
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
@JuanS_OCS , to answer your questions, I suggest you give a good read to the documentation linked at the bottom of the post. You will find there some limitations such as "Blobfuse doesn't guarantee 100% POSIX compliance as it simply translates requests into Blob REST APIs."
Additional considerations can be found in GitHub: https://github.com/azure/azure-storage-fuse#considerations
After reading there, I would not recommend that as a shared filesystem for SAS Grid Manager.
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Hi @EdoardoRiva , thank you very much. I read the document and arrived to same assumption, although it was just my personal assumption. What you mention confirms it.