With the new SAS Viya release (2021.1 and later), CAS can read and write Parquet data files to Azure ADLS2 Blob storage, in addition to CSV and, ORC data files. The data read and write for the parquet file is supported by ADLS CASLIB with the parallel data load mechanism. The CAS worker PODs are loading Parquet data file in parallel from ADLS2 Blob storage.
The following picture describes the CAS load/save from parquet data files stored at Azure Blob Storage.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
To load CAS in parallel from ADLS2 Blob Storage, it requires access to an Azure Access Key file from each CAS worker's PODs. You can mount a PVC to CAS PODs and configure the CAS system parameter cas.AZUREAUTHCACHELOC= to share the Azure Access Key file amongst the CAS PODs.
The detailed steps are discussed in the article manage-azure-access-key-with-azureauthcacheloc-part-2 , for creating and mounting a PVC to CAS PODs against Azure File Share and updating system parameter cas.AZUREAUTHCACHELOC= .
The detailed steps are discussed in the article cas-accessing-azure-data-lake-files, for configuring Azure user application and Storage Account to access ALDS2 Blob storage.
The following screenshot describes the location of the Azure Access Key file (.json file) shared amongst the CAS PODs.
With the CAS PODs mounted to Azure File share and AZUREAUTHCACHELOC= parameter set to Azure File Share, you can use the following code to save and load Parquet data files to ADLS2 blob storage. The ADLS CASLIB will share the Azure Access key amongst the CAS PODs from the central location.
Code:
%let MYSTRGACC="utkumaviya4adls2"; %let MYSTRGFS="fsdata"; %let MYTNTID="b1c14d5c-XXXXXXXX-xxxxxxxxxxxx" ; %let MYAPPID="a2e7cfdc-XXXXXXXXXXXX-xxxxxxxxxxxx"; CAS mySession SESSOPTS=(CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true); caslib ADLS2 datasource=( srctype="adls" accountname=&MYSTRGACC filesystem=&MYSTRGFS dnsSuffix=dfs.core.windows.net timeout=50000 tenantid=&MYTNTID applicationId=&MYAPPID ) path="/sample_data" subdirs; proc casutil incaslib="ADLS2"; list files ; run;quit; /* Save CAS Data to ADLS2 storage data file (parquet) */ proc casutil incaslib="ADLS2" outcaslib="ADLS2" ; load data=sashelp.cars casout="cars" replace; save casdata="cars" casout="cars.parquet" replace; run;quit; /* CAS load from Parquet data files at ADLS2 storage */ proc casutil incaslib="ADLS2" outcaslib="ADLS2"; load casdata="cars.parquet" casout="cars_parquet" replace ; list tables ; run;quit; cas mysession terminate;
Log extract :
..... .............. 78 /* Save CAS Data to ADLS2 storage data file (parquet) */ 79 proc casutil incaslib="ADLS2" outcaslib="ADLS2" ; NOTE: The UUID '2d06260a-99c8-5543-b9b4-478a03fd394f' is connected using session MYSESSION. 80 load data=sashelp.cars casout="cars" replace; NOTE: The INCASLIB= option is ignored when using the DATA= option in the LOAD statement. NOTE: Executing action 'table.addTable'. NOTE: Action 'table.addTable' used (Total process time): NOTE: SASHELP.CARS was successfully added to the "ADLS2" caslib as "CARS". 81 save casdata="cars" casout="cars.parquet" replace; NOTE: Executing action 'table.save'. NOTE: Cloud Analytic Services saved the file cars.parquet in caslib ADLS2. NOTE: Action 'table.save' used (Total process time): ...... ............. .................. 78 /* CAS load from Parquet data files at ADLS2 storage */ 79 proc casutil incaslib="ADLS2" outcaslib="ADLS2"; NOTE: The UUID '2d06260a-99c8-5543-b9b4-478a03fd394f' is connected using session MYSESSION. 81 load casdata="cars.parquet" casout="cars_parquet" replace ; NOTE: Executing action 'table.loadTable'. NOTE: Cloud Analytic Services made the external data from cars.parquet available as table CARS_PARQUET in caslib ADLS2. NOTE: Action 'table.loadTable' used (Total process time): NOTE: The Cloud Analytic Services server processed the request in 0.739576 seconds. .......... .................
The following screenshot describes Parquet data file saved to Azure ADLS2 Blob storage.
Important Links:
CAS accessing Azure Data Lake files
Manage Azure Access Key with AZUREAUTHCACHELOC= (Part -2)
AZUREAUTHCACHELOC System Option
Azure Data Lake Storage Data Source
Find more articles from SAS Global Enablement and Learning here.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.