With the new SAS Viya release (2021.1 and later), CAS can read and write Parquet data files to Azure ADLS2 Blob storage, in addition to CSV and, ORC data files. The data read and write for the parquet file is supported by ADLS CASLIB with the parallel data load mechanism. The CAS worker PODs are loading Parquet data file in parallel from ADLS2 Blob storage.
The following picture describes the CAS load/save from parquet data files stored at Azure Blob Storage.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
To load CAS in parallel from ADLS2 Blob Storage, it requires access to an Azure Access Key file from each CAS worker's PODs. You can mount a PVC to CAS PODs and configure the CAS system parameter cas.AZUREAUTHCACHELOC= to share the Azure Access Key file amongst the CAS PODs.
The detailed steps are discussed in the article manage-azure-access-key-with-azureauthcacheloc-part-2 , for creating and mounting a PVC to CAS PODs against Azure File Share and updating system parameter cas.AZUREAUTHCACHELOC= .
The detailed steps are discussed in the article cas-accessing-azure-data-lake-files, for configuring Azure user application and Storage Account to access ALDS2 Blob storage.
The following screenshot describes the location of the Azure Access Key file (.json file) shared amongst the CAS PODs.
With the CAS PODs mounted to Azure File share and AZUREAUTHCACHELOC= parameter set to Azure File Share, you can use the following code to save and load Parquet data files to ADLS2 blob storage. The ADLS CASLIB will share the Azure Access key amongst the CAS PODs from the central location.
Log extract :
The following screenshot describes Parquet data file saved to Azure ADLS2 Blob storage.
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.