With SAS Viya 2022.9 release, the SAS Compute server supports access to parquet files in an S3 bucket. We all know the SAS Viya environment comes with two compute engine SAS Compute Server and CAS. The CAS already supports access to the parquet file at S3. Similarly, SAS Viya Compute Server supports access to parquet files at the S3 location. This feature is an addition to the Compute Server support of access to parquet files at GCS and Path locations. It enables users to read and write Parquet files into an S3 bucket from the SAS Compute server and CAS. The Compute Server can read and write the parquet file with various compression methods.
This post is about SAS Viya Compute Server accessing the Parquet file from an S3 bucket.
With access to an S3 bucket and AWS access keys available to Compute Pod, you can use the following statement to read and write parquet files to an S3 bucket with various compression methods.
Code:
%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_key="ASXXXXXXXXXXXXXXXXRXM" ;
%let aws_secret_key="OHXoXXXXXXXXXXXXXXXXXXXXXXXXXXXXXJ" ;
%let aws_session_token="lO054gERfHiHzsXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX==";
libname parqtlib parquet "/data/PARQUET/"
storage_platform = "aws"
storage_bucket_name = &s3bucket
storage_aws_key = &aws_key
storage_aws_secret_key = &aws_secret_key
storage_aws_session_token = &aws_session_token
;
data parqtlib.fish_brotli (compress=brotli) ;
set sashelp.fish;
run ;
data parqtlib.fish_lz4 (compress=LZ4) ;
set sashelp.fish;
run ;
Proc SQL outobs=20;
select * from parqtlib.fish_brotli ;
run;quit;
Proc SQL outobs=20;
select * from parqtlib.fish_lz4 ;
run;quit;
Log extract:
....
..............
79
80 %let userid=utkuma;
81 %let s3bucket=&userid.dmviya4 ;
82
83 %let aws_key="ASXXXXXXXXXXXXXXXXXXUU" ;
84 %let aws_secret_key="3OqwAYXXXXXXXXXXXXXXXXXXXXXF" ;
85 %let
85 ! aws_session_token="FwoGZXIvYXdzELTXXXXXXXXXXXXXXXXXXXXXXXXXXX==" ;
NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.
86
87 libname parqtlib parquet "/data/PARQUET/"
88 storage_platform = "aws"
89 storage_bucket_name = &s3bucket
90 storage_aws_key = &aws_key
91 storage_aws_secret_key = &aws_secret_key
92 storage_aws_session_token = &aws_session_token
93 ;
NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.
NOTE: Libref PARQTLIB was successfully assigned as follows:
Engine: PARQUET
Physical Name: /data/PARQUET/
94
95 data parqtlib.fish_brotli (compress=brotli) ;
96 set sashelp.fish;
97 run ;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PARQTLIB.fish_brotli has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.23 seconds
cpu time 0.02 seconds
98
99 data parqtlib.fish_lz4 (compress=LZ4) ;
100 set sashelp.fish;
101 run ;
NOTE: There were 159 observations read from the data set SASHELP.FISH.
NOTE: The data set PARQTLIB.fish_lz4 has 159 observations and 7 variables.
NOTE: DATA statement used (Total process time):
real time 0.27 seconds
cpu time 0.02 seconds
102
103 Proc SQL outobs=20;
104 select * from parqtlib.fish_brotli ;
WARNING: Statement terminated early due to OUTOBS=20 option.
105 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 3.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.36 seconds
cpu time 0.06 seconds
106
107 Proc SQL outobs=20;
108 select * from parqtlib.fish_lz4 ;
WARNING: Statement terminated early due to OUTOBS=20 option.
109 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed page 4.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.28 seconds
cpu time 0.04 seconds
110
....
..............
Result Output:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
File saved at S3 location:
You can also read an S3 folder containing n-number of parquet files into a SAS table. The “directories_as_data=YES” parameter in the LIBNAME statement enables you to read an S3 folder into a SAS table. This LIBNAME option is only supported for data reading and not for data writing to S3 location. The S3 folder containing n-number of parquet files must have the same data schema. The following code is reading data from a variety of data folder names and file extension.
Code:
%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_key="ASXXXXXXXXXXXXXXXXRXM" ;
%let aws_secret_key="OHXoXXXXXXXXXXXXXXXXXXXXXXXXXXXXXJ" ;
%let aws_session_token="lO054gERfHiHzsXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX==";
libname parqtR parquet "/data/PARQUET/"
storage_platform = "aws"
storage_bucket_name = &s3bucket
storage_aws_key = &aws_key
storage_aws_secret_key = &aws_secret_key
storage_aws_session_token = &aws_session_token
directories_as_data=YES
;
Proc SQL outobs=20;
select * from parqtR.BaseballParqt ;
run;quit;
Proc SQL outobs=20;
select * from parqtR.baseball_prqt ;
run;quit;
Proc SQL outobs=20;
select * from parqtR."baseball.parquet"n ;
run;quit;
Log extract:
....
..............
80 %let userid=utkuma;
81 %let s3bucket=&userid.dmviya4 ;
82
83 %let aws_key="ASXXXXXXXXXXXXXXXXXXXXX" ;
84 %let aws_secret_key="d/XXXXXXXXXXXXXXXXXXXXXXXXXXX" ;
85 %let
85 ! aws_session_token="FwoGZXIvYXdzEEEaUqyXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX;
NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.
86
87 libname parqtR parquet "/data/PARQUET/"
88 storage_platform = "aws"
89 storage_bucket_name = &s3bucket
90 storage_aws_key = &aws_key
91 storage_aws_secret_key = &aws_secret_key
92 storage_aws_session_token = &aws_session_token
93 directories_as_data=YES
NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks.
94 ;
NOTE: Libref PARQTR was successfully assigned as follows:
Engine: PARQUET
Physical Name: /data/PARQUET/
95
96 Proc SQL outobs=20;
97 select * from parqtR.BaseballParqt ;
WARNING: Statement terminated early due to OUTOBS=20 option.
98 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed pages 12-13.
NOTE: PROCEDURE SQL used (Total process time):
real time 2.28 seconds
cpu time 0.17 seconds
99
100 Proc SQL outobs=20;
101 select * from parqtR.baseball_prqt ;
WARNING: Statement terminated early due to OUTOBS=20 option.
102 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed pages 14-15.
NOTE: PROCEDURE SQL used (Total process time):
real time 2.88 seconds
cpu time 0.14 seconds
103
104 Proc SQL outobs=20;
105 select * from parqtR."baseball.parquet"n ;
WARNING: Statement terminated early due to OUTOBS=20 option.
106 run;quit;
NOTE: PROC SQL statements are executed immediately; The RUN statement has no effect.
NOTE: The PROCEDURE SQL printed pages 16-17.
NOTE: PROCEDURE SQL used (Total process time):
real time 3.26 seconds
cpu time 0.15 seconds
107
....
..............
Result Output:
Folder read as SAS table from S3 location:
Important Links:
Parquet and ORC LIBNAME statement and Option
Restriction for Parquet File Features
Find more articles from SAS Global Enablement and Learning here.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.