BookmarkSubscribeRSS Feed

Access ADLS2 Parquet data files from SAS Compute Server

Started ‎02-20-2025 by
Modified ‎02-20-2025 by
Views 2,199

With the SAS Viya 204.07 release, the SAS Viya Compute Server can access Parquet data files from Azure ADLS2 Blob storage. The Azure Storage is supported by the parquet LIBNAME engine. The SAS Parquet engine uses the Azure Storage Account Key or Azure Client ID and Secret for Authentication.

 

This post talks about accessing ADLS2 blob storage Parquet files from the SAS Compute Server.

 

Pre-requisites

 

  • SAS Viya release 2024.07 and onwards.
  • User access to Azure Storage Account with Storage Blob Data Contributor role.
  • The Azure Storage Account Blob filesystem with read and write permission to users.
  • Azure Storage account access key OR Azure Client ID (application) and Secret with API access permission to the Azure Data Lake and Azure Storage.

 

Data Path diagram

 

The following diagram describes the access to ADLS2 Blob storage parquet files from the SAS Compute Server.

 

01_UK_SAS_Access_to_ADLS2_Parquet_1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 

Azure information required to access ADLS2 data

 

To access the Azure ADLS2 Blob storage parquet data file from the SAS Compute Server, you need the following information to execute a Parquet LIBNAME statement.

 

    storage_account_name= “eagler0256viya4adls2”
    storage_file_system="fsdata"
    storage_tenant_id="XXXXXXXXXXXXXXXXXXXXXX"
    storage_application_id="XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    storage_client_secret=”XXXXXXXXXXXXXXXXX”

 

ADLS2 Parquet Data file access from SAS Compute Server

 

The following code describes the parquet data file save and read from ADLS2 Blob storage to SAS Compute Server. A Parquet LIBNAME statement can use either the Azure Storage Account access key or Azure Client ID (application) and Secret to authentication with Azure. The following code uses Client ID and Secret to authenticate with Azure.

The Storage_Shared_Key= LIBNAME option is available from SAS Viya release 2024.07.

The Storage_client_secret= LIBNAME option is available from SAS Viya release 20204.08.

 

Code:

%let MYSTRGACC="eagler0256viya4adls2";

%let MYSTRGFS="fsdata";
%let MYTNTID="a708fb09-XXXXXXXXXXXXXXXX";
%let MYAPPID="49200cb0-XXXXXXXXXXXXXX"; 
%let MYAPPSECRET="JVX8Q~XXXXXXXXXXXXX-XXXXXXXXXXX";
%let MYPLTFRM="ADLS";
%let MYFOLDER="/user_data";

options azuretenantid=&MYTNTID;

libname prqtlib parquet &MYFOLDER
   storage_platform=&MYPLTFRM
   storage_account_name=&MYSTRGACC
   storage_file_system=&MYSTRGFS
   storage_application_id=&MYAPPID
   storage_client_secret=&MYAPPSECRET
;


data prqtlib.fish_prqt;
   set sashelp.fish;
run;

data prqtlib.fish_brotli (compress=brotli) ;
   set sashelp.fish;
run ;

data prqtlib.fish_lz4 (compress=LZ4) ;
   set sashelp.fish;
run ;

PROC SQL outobs=20 ;
select * from prqtlib.fish_prqt;
run;

Proc SQL outobs=20;
select * from prqtlib.fish_brotli ;
run;quit;

Proc SQL outobs=20;
select * from prqtlib.fish_lz4 ;
run;quit;

proc contents data=prqtlib.fish_prqt;
run ;
proc contents data=prqtlib.fish_brotli;
run ;
proc contents data=prqtlib.fish_lz4;
run ;

 

Log extract:

…………
……..
92
93   options azuretenantid=&MYTNTID;
94
95   libname prqtlib parquet &MYFOLDER
96      storage_platform=&MYPLTFRM
97      storage_account_name=&MYSTRGACC
98      storage_file_system=&MYSTRGFS
99      storage_application_id=&MYAPPID
100     storage_client_secret=&MYAPPSECRET
101  ;
NOTE: Libref PRQTLIB was successfully assigned as follows:
      Engine:        PARQUET
      Physical Name: /user_data
102
103  data prqtlib.fish_prqt;
104     set sashelp.fish;
105  run;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PRQTLIB.fish_prqt has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.69 seconds
      cpu time            0.26 seconds

106
107  data prqtlib.fish_brotli (compress=brotli) ;
108     set sashelp.fish;
109  run ;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PRQTLIB.fish_brotli has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.36 seconds
      cpu time            0.04 seconds

110
111  data prqtlib.fish_lz4 (compress=LZ4) ;
112     set sashelp.fish;
113  run ;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PRQTLIB.fish_lz4 has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.37 seconds
      cpu time            0.02 seconds

114
115  PROC SQL outobs=20 ;
116  select * from prqtlib.fish_prqt;
WARNING: Statement terminated early due to OUTOBS=20 option.
117  run;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
118
……….
……………

 

The following screenshot describes the Parquet data file saved to Azure ADLS2 by executing the above statement.

 

02_UK_SAS_Access_to_ADLS2_Parquet_2.png

 

 

Data Compression option

 

The SAS Parquet engine supports BROTLI, GZIP, LZ4, LZ4_HADP, SNAPPY, and ZSTD data compression while saving data to ADLS2. The SAS Parquet engine supports default SNAPPY data compression.

 

Important Link:

SAS Parquet LIBNAME Engine Requirement

STORAGE_CLIENT_SECRET= LIBNAME Statement Option

 

 

Find more articles from SAS Global Enablement and Learning here.

Comments

@UttamKumar , does this require that the Blob storage be hierarchical folder with NFS 3.0 protocol and CSI driver? Also is there a direct support to both read and write to parquet file , what I mean is parquet saved not as blob?

@RajeevV , Yes ! it's supported for ADLS2 blob storage with hierarchical namespace enabled. It does not require NFS3.0 and CSI driver.  
Yes ! The SAS parquet LIBNAME engine can read and write parquet data files to local and NFS mounted file system. The file written on local and NFS mounted file system is a standard parquet file, not a blob type file.

 

-Uttam 

@UttamKumar Thanks a lot. 

 

Just another question on this. Are you aware of any performance degradation when the Parquet file is saved as Blob on ADSL Gen2 vs local/NFS mounted standard parquet. Also, a question on the ADSL2, what I understand is ADSL Gen1 used to directly support standard Parquet, so is that not the case with ADSL Gen2? Does that support parquet only as blob storage?

@UttamKumar thanks for sharing, do you have some example with the argument storage_auth_domain

 

Thanks a lot,

Claudio

Contributors
Version history
Last update:
‎02-20-2025 10:02 PM
Updated by:

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags