With the SAS Viya 204.07 release, the SAS Viya Compute Server can access Parquet data files from Azure ADLS2 Blob storage. The Azure Storage is supported by the parquet LIBNAME engine. The SAS Parquet engine uses the Azure Storage Account Key or Azure Client ID and Secret for Authentication.
This post talks about accessing ADLS2 blob storage Parquet files from the SAS Compute Server.
The following diagram describes the access to ADLS2 Blob storage parquet files from the SAS Compute Server.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
To access the Azure ADLS2 Blob storage parquet data file from the SAS Compute Server, you need the following information to execute a Parquet LIBNAME statement.
The following code describes the parquet data file save and read from ADLS2 Blob storage to SAS Compute Server. A Parquet LIBNAME statement can use either the Azure Storage Account access key or Azure Client ID (application) and Secret to authentication with Azure. The following code uses Client ID and Secret to authenticate with Azure.
The Storage_Shared_Key= LIBNAME option is available from SAS Viya release 2024.07.
The Storage_client_secret= LIBNAME option is available from SAS Viya release 20204.08.
Code:
%let MYSTRGACC="eagler0256viya4adls2";
%let MYSTRGFS="fsdata";
%let MYTNTID="a708fb09-XXXXXXXXXXXXXXXX";
%let MYAPPID="49200cb0-XXXXXXXXXXXXXX";
%let MYAPPSECRET="JVX8Q~XXXXXXXXXXXXX-XXXXXXXXXXX";
%let MYPLTFRM="ADLS";
%let MYFOLDER="/user_data";
options azuretenantid=&MYTNTID;
libname prqtlib parquet &MYFOLDER
storage_platform=&MYPLTFRM
storage_account_name=&MYSTRGACC
storage_file_system=&MYSTRGFS
storage_application_id=&MYAPPID
storage_client_secret=&MYAPPSECRET
;
data prqtlib.fish_prqt;
set sashelp.fish;
run;
data prqtlib.fish_brotli (compress=brotli) ;
set sashelp.fish;
run ;
data prqtlib.fish_lz4 (compress=LZ4) ;
set sashelp.fish;
run ;
PROC SQL outobs=20 ;
select * from prqtlib.fish_prqt;
run;
Proc SQL outobs=20;
select * from prqtlib.fish_brotli ;
run;quit;
Proc SQL outobs=20;
select * from prqtlib.fish_lz4 ;
run;quit;
proc contents data=prqtlib.fish_prqt;
run ;
proc contents data=prqtlib.fish_brotli;
run ;
proc contents data=prqtlib.fish_lz4;
run ;
Log extract:
…………
……..
92
93 options azuretenantid=&MYTNTID;
94
95 libname prqtlib parquet &MYFOLDER
96 storage_platform=&MYPLTFRM
97 storage_account_name=&MYSTRGACC
98 storage_file_system=&MYSTRGFS
99 storage_application_id=&MYAPPID
100 storage_client_secret=&MYAPPSECRET
101 ;
NOTE: Libref PRQTLIB was successfully assigned as follows:
Engine: PARQUET
Physical Name: /user_data
102
103 data prqtlib.fish_prqt;
104 set sashelp.fish;
105 run;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PRQTLIB.fish_prqt has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.69 seconds
cpu time 0.26 seconds
106
107 data prqtlib.fish_brotli (compress=brotli) ;
108 set sashelp.fish;
109 run ;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PRQTLIB.fish_brotli has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.36 seconds
cpu time 0.04 seconds
110
111 data prqtlib.fish_lz4 (compress=LZ4) ;
112 set sashelp.fish;
113 run ;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PRQTLIB.fish_lz4 has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.37 seconds
cpu time 0.02 seconds
114
115 PROC SQL outobs=20 ;
116 select * from prqtlib.fish_prqt;
WARNING: Statement terminated early due to OUTOBS=20 option.
117 run;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
118
……….
……………
The following screenshot describes the Parquet data file saved to Azure ADLS2 by executing the above statement.
The SAS Parquet engine supports BROTLI, GZIP, LZ4, LZ4_HADP, SNAPPY, and ZSTD data compression while saving data to ADLS2. The SAS Parquet engine supports default SNAPPY data compression.
Important Link:
SAS Parquet LIBNAME Engine Requirement
STORAGE_CLIENT_SECRET= LIBNAME Statement Option
Find more articles from SAS Global Enablement and Learning here.
@UttamKumar , does this require that the Blob storage be hierarchical folder with NFS 3.0 protocol and CSI driver? Also is there a direct support to both read and write to parquet file , what I mean is parquet saved not as blob?
@RajeevV , Yes ! it's supported for ADLS2 blob storage with hierarchical namespace enabled. It does not require NFS3.0 and CSI driver.
Yes ! The SAS parquet LIBNAME engine can read and write parquet data files to local and NFS mounted file system. The file written on local and NFS mounted file system is a standard parquet file, not a blob type file.
-Uttam
@UttamKumar Thanks a lot.
Just another question on this. Are you aware of any performance degradation when the Parquet file is saved as Blob on ADSL Gen2 vs local/NFS mounted standard parquet. Also, a question on the ADSL2, what I understand is ADSL Gen1 used to directly support standard Parquet, so is that not the case with ADSL Gen2? Does that support parquet only as blob storage?
@UttamKumar thanks for sharing, do you have some example with the argument storage_auth_domain?
Thanks a lot,
Claudio
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.