With SAS Viya 2021.2 LTS release, the CAS has improved features to access data files from S3 locations. The CAS can load from the parquet data file with or without .parquet file extension. It can load from compressed CSV file. It can load from Xls and Xlsx data files.
The CAS directly reads a list of data files generated by third-party applications. The CAS load from S3 data files is in parallel using S3 CASLIB. Each CAS nodes read and write data to the S3 location in parallel.
This post is about CAS accessing S3 data files using S3 CASLIB.
The following diagram describes the CAS access to S3 data files using S3 CASLIB.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
The following steps describe the CAS load/save from/to the S3 data file.
With valid AWS Access keys available to CAS Controller and Node Pods, the following code can be used to load/save the CAS from/to the S3-CSV data file. The code also describes the CAS load from a subfolder containing the list of CSV data files. The data read/ write from CAS to the S3 bucket are in parallel. The AWS access key is available to CAS Pods using a PVC “/mnt/viya-share/config/access-key/credentials" mounted to the NFS server.
When new and incremental CSV files are placed in the same S3 sub-folder, the CAS table must be reloaded. There is no auto incremental CAS load from delta data files.
The following screen describes the CSV data file located at the S3 bucket which is being loaded to CAS using S3 type CASLIB.
Code:
%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_config_file="/mnt/viya-share/config/access-key/config" ;
%let aws_credentials_file="/mnt/viya-share/config/access-key/credentials" ;
%let aws_profile="182696677754-testers" ;
%let aws_region="US_East";
%let objpath="/data/";
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
/* CASLIB to S3 bucket data folder */
caslib AWSCAS1 datasource=(srctype="s3",
awsConfigPath=&aws_config_file,
awsCredentialsPath=&aws_credentials_file ,
awsCredentialsProfile=&aws_profile,
region=&aws_region,
bucket=&s3bucket,
objectpath=&objpath
) subdirs ;
/* Load CAS from single CSV data file */
proc casutil incaslib="AWSCAS1" outcaslib="public";
droptable casdata="cars1" incaslib="public" quiet;
load casdata="CSV/cars/cars1.csv" casout="cars1" promote ;
run;
quit;
/* Load CAS from CSV data-folder containing multiple files */
proc casutil incaslib="AWSCAS1" outcaslib="public";
droptable casdata="cars" incaslib="public" quiet;
load casdata="CSV/cars" casout="cars" promote
IMPORTOPTIONS=(FILETYPE="csv", multiFile=true, recurse=True, showFullpath=true) ;
run;
quit;
/* Save CAS table to S3 as CSV file. */
proc casutil incaslib="public" outcaslib="AWSCAS1";
save casdata="cars" casout="CSV/cars_GEL.csv" replace ;
run;
quit;
proc casutil incaslib="public" outcaslib="public";
list tables ;
run;quit;
CAS mySession TERMINATE;
Log extract :
....
..............
1 %studio_hide_wrapper;
77
78 %let userid=utkuma;
79
80 %let s3bucket=&userid.dmviya4 ;
81 %let aws_config_file="/mnt/viya-share/config/access-key/config" ;
82 %let aws_credentials_file="/mnt/viya-share/config/access-key/credentials" ;
83 %let aws_profile="182696677754-testers" ;
84 %let aws_region="US_East";
85 %let objpath="/data/";
86
87 CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
NOTE: The session MYSESSION connected successfully to Cloud Analytic Services sas-cas-server-default-client using port 5570. The
UUID is b9e5d45b-feb9-3244-86d7-fabae3d40505. The user is viya_admin and the active caslib is CASUSER(viya_admin).
NOTE: The SAS option SESSREF was updated with the value MYSESSION.
NOTE: The SAS macro _SESSREF_ was updated with the value MYSESSION.
NOTE: The session is using 3 workers.
NOTE: 'CASUSER(viya_admin)' is now the active caslib.
NOTE: Action 'sessionProp.setSessOpt' used (Total process time):
NOTE: The CAS statement request to update one or more session options for session MYSESSION completed.
88
89 /* CASLIB to S3 bucket data folder */
90 caslib AWSCAS1 datasource=(srctype="s3",
91 awsConfigPath=&aws_config_file,
92 awsCredentialsPath=&aws_credentials_file ,
93 awsCredentialsProfile=&aws_profile,
94 region=&aws_region,
95 bucket=&s3bucket,
96 objectpath=&objpath
97 ) subdirs ;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AWSCAS1' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AWSCAS1'.
NOTE: Action 'table.addCaslib' used (Total process time):
NOTE: Action to ADD caslib AWSCAS1 completed for session MYSESSION.
98
99 /* Load CAS from single CSV data file */
100 proc casutil incaslib="AWSCAS1" outcaslib="public";
NOTE: The UUID 'b9e5d45b-feb9-3244-86d7-fabae3d40505' is connected using session MYSESSION.
101 droptable casdata="cars1" incaslib="public" quiet;
NOTE: Executing action 'table.dropTable'.
NOTE: Action 'table.dropTable' used (Total process time):
102
102! load casdata="CSV/cars/cars1.csv" casout="cars1" promote ;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file CSV/cars/cars1.csv in AWS S3 bucket utkumadmviya4 available as table CARS1 in caslib
public.
NOTE: Action 'table.loadTable' used (Total process time):
….
…..
105
106
107 /* Load CAS from CSV data-folder containing multiple files */
108 proc casutil incaslib="AWSCAS1" outcaslib="public";
NOTE: The UUID 'b9e5d45b-feb9-3244-86d7-fabae3d40505' is connected using session MYSESSION.
109 droptable casdata="cars" incaslib="public" quiet;
NOTE: Executing action 'table.dropTable'.
110
110! load casdata="CSV/cars" casout="cars" promote
111 IMPORTOPTIONS=(FILETYPE="csv", multiFile=true, recurse=True, showFullpath=true) ;
NOTE: Executing action 'table.loadTable'.
NOTE: The file, '/data/CSV/cars/cars.csv' was used to create the CAS Table column names.
NOTE: The CSV file table load for table, 'cars' produced 1281 rows from 3 files.
NOTE: Cloud Analytic Services made the file CSV/cars in AWS S3 bucket utkumadmviya4 available as table CARS in caslib public.
…..
…………….
114
115 /* Save CAS table to S3 as CSV file. */
116 proc casutil incaslib="public" outcaslib="AWSCAS1";
NOTE: The UUID 'b9e5d45b-feb9-3244-86d7-fabae3d40505' is connected using session MYSESSION.
117 save casdata="cars" casout="CSV/cars_GEL.csv" replace ;
NOTE: Executing action 'table.save'.
NOTE: Cloud Analytic Services saved the file CSV/cars_GEL.csv in caslib AWSCAS1.
....
..............
.................
Result Output: CAS table loaded from S3 CSV data files.
CAS table saved to S3 CSV data files.
With valid AWS Access keys available to CAS Controller and Node Pods, the following code can be used to load/save the CAS from/to the S3-Parquet data file. The code describes the CAS load from a subfolder containing the list of parquet data files. The CAS can be loaded from parquet file with or without the .parquet extension. The CAS can be loaded from a single file as well.
The data read/write from CAS to the S3 bucket are in parallel. The AWS access key is available to CAS Pods using a PVC “/mnt/viya-share/config/access-key/credentials" mounted to the NFS server.
When new and incremental parquet files are placed in the same S3 sub-folder, the CAS table must be reloaded. There is no auto incremental CAS load from delta data files.
The following screen describes the parquet data files with/without .parquet extension located at the S3 bucket which is being loaded to CAS using S3 type CASLIB.
Code:
%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_config_file="/mnt/viya-share/config/access-key/config" ;
%let aws_credentials_file="/mnt/viya-share/config/access-key/credentials" ;
%let aws_profile="182696677754-testers" ;
%let aws_region="US_East";
%let objpath="/data/";
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
/* CASLIB to S3 bucket data folder */
caslib AWSCAS1 datasource=(srctype="s3",
awsConfigPath=&aws_config_file,
awsCredentialsPath=&aws_credentials_file ,
awsCredentialsProfile=&aws_profile,
region=&aws_region,
bucket=&s3bucket,
objectpath=&objpath
) subdirs ;
/* CAS load from S3-Parquet data files */
proc casutil incaslib="AWSCAS1" outcaslib="public";
droptable casdata="baseball_GEL" incaslib="public" quiet;
load casdata="PARQUET/baseball.parquet" casout="baseball_GEL" promote ;
run;
quit;
/* Save CAS table to S3 as parquet file. */
proc casutil incaslib="public" outcaslib="AWSCAS1";
save casdata="baseball_GEL" casout="PARQUET/baseball_GEL.parquet" replace ;
run;
quit;
/* Load CAS from CAS saved Parquet data files */
proc casutil incaslib="AWSCAS1" outcaslib="public";
droptable casdata="baseball_GEL_NEW" incaslib="public" quiet;
load casdata="PARQUET/baseball_GEL.parquet" casout="baseball_GEL_NEW" promote ;
run;
quit;
/* Load CAS from Parquet data-folder ( without extension .parquet) */
proc casutil incaslib="AWSCAS1" outcaslib="public";
droptable casdata="baseball_prqt" incaslib="public" quiet;
load casdata="PARQUET/baseball_prqt" casout="baseball_prqt" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
run;
quit;
/* Load CAS from single Parquet data file ( without extension .parquet) */
proc casutil incaslib="AWSCAS1" outcaslib="public";
droptable casdata="baseball_prqt_1" incaslib="public" quiet;
load casdata="PARQUET/baseball_prqt/baseball_prqt_1" casout="baseball_prqt_1" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
run;
quit;
proc casutil incaslib="public" outcaslib="public";
list tables ;
run;quit;
CAS mySession TERMINATE;
Log extract:
....
..............
99 /* CAS load from S3-Parquet data files */
100 proc casutil incaslib="AWSCAS1" outcaslib="public";
102! load casdata="PARQUET/baseball.parquet" casout="baseball_GEL" promote ;
NOTE: Cloud Analytic Services made the file PARQUET/baseball.parquet available as table BASEBALL_GEL in caslib public.
105
106 /* Save CAS table to S3 as parquet file. */
107 proc casutil incaslib="public" outcaslib="AWSCAS1";
108 save casdata="baseball_GEL" casout="PARQUET/baseball_GEL.parquet" replace ;
NOTE: Cloud Analytic Services saved the file PARQUET/baseball_GEL.parquet in caslib AWSCAS1.
112
113 /* Load CAS from CAS saved Parquet data files */
114 proc casutil incaslib="AWSCAS1" outcaslib="public";
116! load casdata="PARQUET/baseball_GEL.parquet" casout="baseball_GEL_NEW" promote ;
NOTE: Cloud Analytic Services made the file PARQUET/baseball_GEL.parquet available as table BASEBALL_GEL_NEW in caslib public.
117 run;
118 quit;
121 /* Load CAS from Parquet data-folder ( without extension .parquet) */
122 proc casutil incaslib="AWSCAS1" outcaslib="public";
124! load casdata="PARQUET/baseball_prqt" casout="baseball_prqt" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
NOTE: Cloud Analytic Services made the file PARQUET/baseball_prqt available as table BASEBALL_PRQT in caslib public.
129 /* Load CAS from single Parquet data file ( without extension .parquet) */
130 proc casutil incaslib="AWSCAS1" outcaslib="public";
using session MYSESSION.
131 droptable casdata="baseball_prqt_1" incaslib="public" quiet;
NOTE: Executing action 'table.dropTable'.
132! load casdata="PARQUET/baseball_prqt/baseball_prqt_1" casout="baseball_prqt_1" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file PARQUET/baseball_prqt/baseball_prqt_1 available as table BASEBALL_PRQT_1 in caslib
public.
....
..............
.................
Result Output:
CAS table loaded from S3 parquet data files.
CAS table saved to S3 parquet data files.
With valid AWS Access keys available to CAS Controller and Node Pods, the following code can be used to load the CAS from the S3 image and video data file. The code describes the CAS load from a subfolder containing the list of data files. The CAS table contains one row for each image and video file.
The data read from CAS to the S3 bucket are in parallel.The AWS access key is available to CAS Pods using a PVC “/mnt/viya-share/config/access-key/credentials" mounted to the NFS server.
When new data files are placed in the same S3 sub-folder, the CAS table must be reloaded. There is no auto incremental CAS load from delta data files.
The following screen describes the image and audio data files located at the S3 bucket which is being loaded to CAS using S3 type CASLIB.
Code:
%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_config_file="/mnt/viya-share/config/access-key/config" ;
%let aws_credentials_file="/mnt/viya-share/config/access-key/credentials" ;
%let aws_profile="182696677754-testers" ;
%let aws_region="US_East";
%let objpath="/data/";
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
/* CASLIB to S3 bucket data folder */
caslib AWSCAS1 datasource=(srctype="s3",
awsConfigPath=&aws_config_file,
awsCredentialsPath=&aws_credentials_file ,
awsCredentialsProfile=&aws_profile,
region=&aws_region,
bucket=&s3bucket,
objectpath=&objpath
) subdirs ;
/* CAS load from Img and Vdo files located at S3 location */
proc casutil incaslib="AWSCAS1" outcaslib="public" ;
droptable casdata="IMGS_GEL" incaslib="public" quiet;
load casdata="" importOptions=(filetype="IMAGE" ) casout="IMGS_GEL" promote;
droptable casdata="VDO_GEL" incaslib="public" quiet;
load casdata="" importOptions=(filetype="VIDEO" ) casout="VDO_GEL" promote;
run; quit ;
proc casutil incaslib="public" outcaslib="public";
list tables ;
run;quit;
CAS mySession TERMINATE;
Log extract :
....
..............
99 /* CAS load from Img and Vdo files located at S3 location */
100 proc casutil incaslib="AWSCAS1" outcaslib="public" ;
NOTE: The UUID '713850ca-2e62-3044-9f6f-f37452684fcf' is connected using session MYSESSION.
101 droptable casdata="IMGS_GEL" incaslib="public" quiet;
NOTE: Executing action 'table.dropTable'.
NOTE: Action 'table.dropTable' used (Total process time):
NOTE: The Cloud Analytic Services server processed the request in 0.02023 seconds.
102 load casdata="" importOptions=(filetype="IMAGE" ) casout="IMGS_GEL" promote;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made documents available as table IMGS_GEL in caslib public.
104 load casdata="" importOptions=(filetype="VIDEO" ) casout="VDO_GEL" promote;
NOTE: Cloud Analytic Services made documents available as table VDO_GEL in caslib public.
..............
.................
Result Output:
CAS table loaded from S3 image and video data files.
CAS table with S3 image data, for each image file there is a row .
Important Link: S3 Data Source Caslibs
Find more articles from SAS Global Enablement and Learning here.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.