BookmarkSubscribeRSS Feed

SAS Viya Compute Server accessing S3 Parquet file

Started ‎11-01-2022 by
Modified ‎11-01-2022 by
Views 1,988

With SAS Viya 2022.9 release, the SAS Compute server supports access to parquet files in an S3 bucket. We all know the SAS Viya environment comes with two compute engine SAS Compute Server and CAS. The CAS already supports access to the parquet file at S3. Similarly, SAS Viya Compute Server supports access to parquet files at the S3 location. This feature is an addition to the Compute Server support of access to parquet files at GCS and Path locations. It enables users to read and write Parquet files into an S3 bucket from the SAS Compute server and CAS. The Compute Server can read and write the parquet file with various compression methods.

 

This post is about SAS Viya Compute Server accessing the Parquet file from an S3 bucket.

 

Compression Methods Supported

  • UNCOMPRESSED
  • SNAPPY
  • GZIP
  • BROTLI
  • ZSTD
  • LZ4
  • LZ4_HADOOP

 

SAS Viya Compute Server access to S3 Parquet file

 

With access to an S3 bucket and AWS access keys available to Compute Pod, you can use the following statement to read and write parquet files to an S3 bucket with various compression methods.

 

Code:

 

%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;

%let aws_key="ASXXXXXXXXXXXXXXXXRXM"  ;
%let aws_secret_key="OHXoXXXXXXXXXXXXXXXXXXXXXXXXXXXXXJ"  ;
%let aws_session_token="lO054gERfHiHzsXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX==";

libname parqtlib parquet "/data/PARQUET/" 
   storage_platform = "aws"
   storage_bucket_name = &s3bucket
   storage_aws_key = &aws_key
   storage_aws_secret_key = &aws_secret_key
   storage_aws_session_token = &aws_session_token
   ;

data parqtlib.fish_brotli (compress=brotli) ;
   set sashelp.fish;
run ;

data parqtlib.fish_lz4 (compress=LZ4) ;
   set sashelp.fish;
run ;

Proc SQL outobs=20;
select * from parqtlib.fish_brotli ;
run;quit;

Proc SQL outobs=20;
select * from parqtlib.fish_lz4 ;
run;quit;

 

Log extract:

 

....
..............
79   
80   %let userid=utkuma;
81   %let s3bucket=&userid.dmviya4 ;
82   
83   %let aws_key="ASXXXXXXXXXXXXXXXXXXUU"  ;
84   %let aws_secret_key="3OqwAYXXXXXXXXXXXXXXXXXXXXXF"  ;
85   %let
85 ! aws_session_token="FwoGZXIvYXdzELTXXXXXXXXXXXXXXXXXXXXXXXXXXX==" ;
NOTE: The quoted string currently being processed has become more than 262 bytes long.  You might have unbalanced quotation marks.
86   
87   libname parqtlib parquet "/data/PARQUET/"
88      storage_platform = "aws"
89      storage_bucket_name = &s3bucket
90      storage_aws_key = &aws_key
91      storage_aws_secret_key = &aws_secret_key
92      storage_aws_session_token = &aws_session_token
93      ;
NOTE: The quoted string currently being processed has become more than 262 bytes long.  You might have unbalanced quotation marks.
NOTE: Libref PARQTLIB was successfully assigned as follows: 
      Engine:        PARQUET 
      Physical Name: /data/PARQUET/
94   
95   data parqtlib.fish_brotli (compress=brotli) ;
96      set sashelp.fish;
97   run ;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PARQTLIB.fish_brotli has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.23 seconds
      cpu time            0.02 seconds
      
98   
99   data parqtlib.fish_lz4 (compress=LZ4) ;
100     set sashelp.fish;
101  run ;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PARQTLIB.fish_lz4 has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.27 seconds
      cpu time            0.02 seconds
      
102  
103  Proc SQL outobs=20;
104  select * from parqtlib.fish_brotli ;
WARNING: Statement terminated early due to OUTOBS=20 option.
105  run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 3.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.36 seconds
      cpu time            0.06 seconds
      
106  
107  Proc SQL outobs=20;
108  select * from parqtlib.fish_lz4 ;
WARNING: Statement terminated early due to OUTOBS=20 option.
109  run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 4.
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.28 seconds
      cpu time            0.04 seconds
      
110  
....
..............

 

Result Output:

 

uk_1_SASViya_ComputeServer_Accessing_S3ParquetFile_1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 File saved at S3 location:

 

uk_2_SASViya_ComputeServer_Accessing_S3ParquetFile_2.png

 

You can also read an S3 folder containing n-number of parquet files into a SAS table. The “directories_as_data=YES” parameter in the LIBNAME statement enables you to read an S3 folder into a SAS table. This LIBNAME option is only supported for data reading and not for data writing to S3 location. The S3 folder containing n-number of parquet files must have the same data schema. The following code is reading data from a variety of data folder names and file extension.

 

Code:

 

%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;

%let aws_key="ASXXXXXXXXXXXXXXXXRXM"  ;
%let aws_secret_key="OHXoXXXXXXXXXXXXXXXXXXXXXXXXXXXXXJ"  ;
%let aws_session_token="lO054gERfHiHzsXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX==";

libname parqtR parquet "/data/PARQUET/" 
   storage_platform = "aws"
   storage_bucket_name = &s3bucket
   storage_aws_key = &aws_key
   storage_aws_secret_key = &aws_secret_key
   storage_aws_session_token = &aws_session_token
   directories_as_data=YES
   ;

Proc SQL outobs=20;
select * from parqtR.BaseballParqt ;
run;quit;

Proc SQL outobs=20;
select * from parqtR.baseball_prqt ;
run;quit;

Proc SQL outobs=20;
select * from parqtR."baseball.parquet"n ;
run;quit;

 

Log extract:

 

....
.............. 
80   %let userid=utkuma;
81   %let s3bucket=&userid.dmviya4 ;
82   
83   %let aws_key="ASXXXXXXXXXXXXXXXXXXXXX"  ;
84   %let aws_secret_key="d/XXXXXXXXXXXXXXXXXXXXXXXXXXX"  ;
85   %let
85 ! aws_session_token="FwoGZXIvYXdzEEEaUqyXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX;
NOTE: The quoted string currently being processed has become more than 262 bytes long.  You might have unbalanced quotation marks.
86   
87   libname parqtR parquet "/data/PARQUET/"
88      storage_platform = "aws"
89      storage_bucket_name = &s3bucket
90      storage_aws_key = &aws_key
91      storage_aws_secret_key = &aws_secret_key
92      storage_aws_session_token = &aws_session_token
93      directories_as_data=YES
NOTE: The quoted string currently being processed has become more than 262 bytes long.  You might have unbalanced quotation marks.
94      ;
NOTE: Libref PARQTR was successfully assigned as follows: 
      Engine:        PARQUET 
      Physical Name: /data/PARQUET/
95   
96   Proc SQL outobs=20;
97   select * from parqtR.BaseballParqt ;
WARNING: Statement terminated early due to OUTOBS=20 option.
98   run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed pages 12-13.
NOTE: PROCEDURE SQL used (Total process time):
      real time           2.28 seconds
      cpu time            0.17 seconds
      
99   
100  Proc SQL outobs=20;
101  select * from parqtR.baseball_prqt ;
WARNING: Statement terminated early due to OUTOBS=20 option.
102  run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed pages 14-15.
NOTE: PROCEDURE SQL used (Total process time):
      real time           2.88 seconds
      cpu time            0.14 seconds
      
103  
104  Proc SQL outobs=20;
105  select * from parqtR."baseball.parquet"n ;
WARNING: Statement terminated early due to OUTOBS=20 option.
106  run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed pages 16-17.
NOTE: PROCEDURE SQL used (Total process time):
      real time           3.26 seconds
      cpu time            0.15 seconds
      
107  
....
..............

 

Result Output:

 

uk_3_SASViya_ComputeServer_Accessing_S3ParquetFile_3.png

 

Folder read as SAS table from S3 location:

 

uk_4_SASViya_ComputeServer_Accessing_S3ParquetFile_4.png

 

uk_5_SASViya_ComputeServer_Accessing_S3ParquetFile_5.png

 

uk_6_SASViya_ComputeServer_Accessing_S3ParquetFile_6.png

 

Implementation Considerations:

  • The LIBNAME statement supports hardcode AWS Access Key in a string variable instead of utilizing config and credentials files for the access key.
  • The SAS Viya Compute server does not support the data read from a partitioned parquet file. If you have a partitioned parquet file stored by another application and looking to read specific partitioned data, it’s not supported.

 

 

Important Links:

 

About Parquet and ORC Engines

Parquet and ORC LIBNAME statement and Option

Restriction for Parquet File Features 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎11-01-2022 11:01 AM
Updated by:
Contributors

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags