BookmarkSubscribeRSS Feed

SAS Viya CAS accessing S3 data files

Started ‎04-07-2022 by
Modified ‎04-07-2022 by
Views 3,541

With SAS Viya 2021.2 LTS release, the CAS has improved features to access data files from S3 locations. The CAS can load from the parquet data file with or without .parquet file extension. It can load from compressed CSV file. It can load from Xls and Xlsx data files.

 

The CAS directly reads a list of data files generated by third-party applications. The CAS load from S3 data files is in parallel using S3 CASLIB. Each CAS nodes read and write data to the S3 location in parallel.

 

This post is about CAS accessing S3 data files using S3 CASLIB.

 

S3 CASLIB supported file types

 

  • CSV (single and multiple files)
  • Compressed CSV (.gz8)
  • SASHDAT
  • Parquet
  • Image
  • Document
  • Audio
  • Video
  • Xls, Xlsx

 

Pre-requisites

 

  • AWS Access keys available to CAS controller and node Pods via NFS mount.
  • Read/write permission to the S3 bucket.

 

Data access path

 

The following diagram describes the CAS access to S3 data files using S3 CASLIB.

 

uk_1_CAS_Accesing_S3_Datafiles_1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

The following steps describe the CAS load/save from/to the S3 data file.

 

CAS access to S3 CSV files

 

With valid AWS Access keys available to CAS Controller and Node Pods, the following code can be used to load/save the CAS from/to the S3-CSV data file. The code also describes the CAS load from a subfolder containing the list of CSV data files. The data read/ write from CAS to the S3 bucket are in parallel. The AWS access key is available to CAS Pods using a PVC “/mnt/viya-share/config/access-key/credentials" mounted to the NFS server.

 

When new and incremental CSV files are placed in the same S3 sub-folder, the CAS table must be reloaded. There is no auto incremental CAS load from delta data files.

 

The following screen describes the CSV data file located at the S3 bucket which is being loaded to CAS using S3 type CASLIB.

 

uk_2_CAS_Accesing_S3_Datafiles_2.png

 

Code:

 

%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_config_file="/mnt/viya-share/config/access-key/config"  ;
%let aws_credentials_file="/mnt/viya-share/config/access-key/credentials"  ;
%let aws_profile="182696677754-testers"  ;
%let aws_region="US_East";
%let objpath="/data/";

CAS mySession  SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

/* CASLIB to S3 bucket data folder */
caslib AWSCAS1 datasource=(srctype="s3",
                   awsConfigPath=&aws_config_file,
                   awsCredentialsPath=&aws_credentials_file ,
                   awsCredentialsProfile=&aws_profile,
                   region=&aws_region,
                   bucket=&s3bucket,
                   objectpath=&objpath
                   ) subdirs ;

/* Load CAS from single CSV data file  */
proc casutil incaslib="AWSCAS1"  outcaslib="public";
    droptable casdata="cars1" incaslib="public"  quiet;
   	load casdata="CSV/cars/cars1.csv" casout="cars1" promote ;
run;
quit;


/* Load CAS from CSV data-folder containing multiple files */
proc casutil incaslib="AWSCAS1"  outcaslib="public";
    droptable casdata="cars" incaslib="public"  quiet;
   	load casdata="CSV/cars" casout="cars" promote
IMPORTOPTIONS=(FILETYPE="csv", multiFile=true, recurse=True, showFullpath=true) ;
run;
quit;


/* Save CAS table to S3 as CSV file. */
proc casutil  incaslib="public"  outcaslib="AWSCAS1";
   save casdata="cars" casout="CSV/cars_GEL.csv" replace ;
run;
quit;


proc casutil incaslib="public"  outcaslib="public";
list tables ;
run;quit;

CAS mySession  TERMINATE;

 

Log extract :

 

....
..............
1    %studio_hide_wrapper;
77   
78   %let userid=utkuma;
79   
80   %let s3bucket=&userid.dmviya4 ;
81   %let aws_config_file="/mnt/viya-share/config/access-key/config"  ;
82   %let aws_credentials_file="/mnt/viya-share/config/access-key/credentials"  ;
83   %let aws_profile="182696677754-testers"  ;
84   %let aws_region="US_East";
85   %let objpath="/data/";
86   
87   CAS mySession  SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
NOTE: The session MYSESSION connected successfully to Cloud Analytic Services sas-cas-server-default-client using port 5570. The 
      UUID is b9e5d45b-feb9-3244-86d7-fabae3d40505. The user is viya_admin and the active caslib is CASUSER(viya_admin).
NOTE: The SAS option SESSREF was updated with the value MYSESSION.
NOTE: The SAS macro _SESSREF_ was updated with the value MYSESSION.
NOTE: The session is using 3 workers.
NOTE: 'CASUSER(viya_admin)' is now the active caslib.
NOTE: Action 'sessionProp.setSessOpt' used (Total process time):
NOTE: The CAS statement request to update one or more session options for session MYSESSION completed.
88   
89   /* CASLIB to S3 bucket data folder */
90   caslib AWSCAS1 datasource=(srctype="s3",
91                      awsConfigPath=&aws_config_file,
92                      awsCredentialsPath=&aws_credentials_file ,
93                      awsCredentialsProfile=&aws_profile,
94                      region=&aws_region,
95                      bucket=&s3bucket,
96                      objectpath=&objpath
97                      ) subdirs ;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AWSCAS1' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AWSCAS1'.
NOTE: Action 'table.addCaslib' used (Total process time):
NOTE: Action to ADD caslib AWSCAS1 completed for session MYSESSION.
98   
99   /* Load CAS from single CSV data file  */
100  proc casutil incaslib="AWSCAS1"  outcaslib="public";
NOTE: The UUID 'b9e5d45b-feb9-3244-86d7-fabae3d40505' is connected using session MYSESSION.
101      droptable casdata="cars1" incaslib="public"  quiet;
NOTE: Executing action 'table.dropTable'.
NOTE: Action 'table.dropTable' used (Total process time):

102     
102!     load casdata="CSV/cars/cars1.csv" casout="cars1" promote ;

NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file CSV/cars/cars1.csv in AWS S3 bucket utkumadmviya4 available as table CARS1 in caslib 
      public.
NOTE: Action 'table.loadTable' used (Total process time):
….
…..
      
105  
106  
107  /* Load CAS from CSV data-folder containing multiple files */
108  proc casutil incaslib="AWSCAS1"  outcaslib="public";
NOTE: The UUID 'b9e5d45b-feb9-3244-86d7-fabae3d40505' is connected using session MYSESSION.
109      droptable casdata="cars" incaslib="public"  quiet;
NOTE: Executing action 'table.dropTable'.
110     
110!     load casdata="CSV/cars" casout="cars" promote
111  IMPORTOPTIONS=(FILETYPE="csv", multiFile=true, recurse=True, showFullpath=true) ;
NOTE: Executing action 'table.loadTable'.
NOTE: The file, '/data/CSV/cars/cars.csv' was used to create the CAS Table column names.
NOTE: The CSV file table load for table, 'cars' produced 1281 rows from 3 files.
NOTE: Cloud Analytic Services made the file CSV/cars in AWS S3 bucket utkumadmviya4 available as table CARS in caslib public.
…..
…………….
      
114  
115  /* Save CAS table to S3 as CSV file. */
116  proc casutil  incaslib="public"  outcaslib="AWSCAS1";
NOTE: The UUID 'b9e5d45b-feb9-3244-86d7-fabae3d40505' is connected using session MYSESSION.
117     save casdata="cars" casout="CSV/cars_GEL.csv" replace ;
NOTE: Executing action 'table.save'.
NOTE: Cloud Analytic Services saved the file CSV/cars_GEL.csv in caslib AWSCAS1.
....
..............
.................

 

Result Output: CAS table loaded from S3 CSV data files.

 

uk_3_CAS_Accesing_S3_Datafiles_3.png

 

CAS table saved to S3 CSV data files.

 

uk_4_CAS_Accesing_S3_Datafiles_4.png

 

CAS access to S3 Parquet data files

 

With valid AWS Access keys available to CAS Controller and Node Pods, the following code can be used to load/save the CAS from/to the S3-Parquet data file. The code describes the CAS load from a subfolder containing the list of parquet data files. The CAS can be loaded from parquet file with or without the .parquet extension. The CAS can be loaded from a single file as well.

 

The data read/write from CAS to the S3 bucket are in parallel. The AWS access key is available to CAS Pods using a PVC “/mnt/viya-share/config/access-key/credentials" mounted to the NFS server.

 

When new and incremental parquet files are placed in the same S3 sub-folder, the CAS table must be reloaded. There is no auto incremental CAS load from delta data files.

 

The following screen describes the parquet data files with/without .parquet extension located at the S3 bucket which is being loaded to CAS using S3 type CASLIB.

 

uk-5_CAS_Accesing_S3_Datafiles_5.png

 

 

uk_6_CAS_Accesing_S3_Datafiles_6.png

 

Code:

 

%let userid=utkuma;

%let s3bucket=&userid.dmviya4 ;
%let aws_config_file="/mnt/viya-share/config/access-key/config"  ;
%let aws_credentials_file="/mnt/viya-share/config/access-key/credentials"  ;
%let aws_profile="182696677754-testers"  ;
%let aws_region="US_East";
%let objpath="/data/";

CAS mySession  SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

/* CASLIB to S3 bucket data folder */
caslib AWSCAS1 datasource=(srctype="s3",
                   awsConfigPath=&aws_config_file,
                   awsCredentialsPath=&aws_credentials_file ,
                   awsCredentialsProfile=&aws_profile,
                   region=&aws_region,
                   bucket=&s3bucket,
                   objectpath=&objpath
                   ) subdirs ;

/* CAS load from S3-Parquet data files */
proc casutil incaslib="AWSCAS1"  outcaslib="public";
    droptable casdata="baseball_GEL" incaslib="public"  quiet;
   	load casdata="PARQUET/baseball.parquet" casout="baseball_GEL"  promote ;
run;
quit;

/* Save CAS table to S3 as parquet file. */
proc casutil  incaslib="public"  outcaslib="AWSCAS1";
   save casdata="baseball_GEL" casout="PARQUET/baseball_GEL.parquet" replace ;
run;
quit;


/* Load CAS from CAS saved Parquet data files */
proc casutil incaslib="AWSCAS1"  outcaslib="public";
    droptable casdata="baseball_GEL_NEW" incaslib="public"  quiet;
   	load casdata="PARQUET/baseball_GEL.parquet" casout="baseball_GEL_NEW"  promote ;
run;
quit;


/* Load CAS from Parquet data-folder ( without  extension .parquet) */
proc casutil incaslib="AWSCAS1"  outcaslib="public";
    droptable casdata="baseball_prqt" incaslib="public"  quiet;
   	load casdata="PARQUET/baseball_prqt" casout="baseball_prqt" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
run;
quit;


/* Load CAS from single Parquet data file ( without extension .parquet) */
proc casutil incaslib="AWSCAS1"  outcaslib="public";
    droptable casdata="baseball_prqt_1" incaslib="public"  quiet;
   	load casdata="PARQUET/baseball_prqt/baseball_prqt_1" casout="baseball_prqt_1" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
run;
quit;


proc casutil incaslib="public"  outcaslib="public";
list tables ;
run;quit;

CAS mySession  TERMINATE;

 

Log extract:

 

....
..............
99   /* CAS load from S3-Parquet data files */
100  proc casutil incaslib="AWSCAS1"  outcaslib="public";
102!     load casdata="PARQUET/baseball.parquet" casout="baseball_GEL"  promote ;
NOTE: Cloud Analytic Services made the file PARQUET/baseball.parquet available as table BASEBALL_GEL in caslib public.

105  
106  /* Save CAS table to S3 as parquet file. */
107  proc casutil  incaslib="public"  outcaslib="AWSCAS1";
108     save casdata="baseball_GEL" casout="PARQUET/baseball_GEL.parquet" replace ;
NOTE: Cloud Analytic Services saved the file PARQUET/baseball_GEL.parquet in caslib AWSCAS1.

112  
113  /* Load CAS from CAS saved Parquet data files */
114  proc casutil incaslib="AWSCAS1"  outcaslib="public";

116!     load casdata="PARQUET/baseball_GEL.parquet" casout="baseball_GEL_NEW"  promote ;
NOTE: Cloud Analytic Services made the file PARQUET/baseball_GEL.parquet available as table BASEBALL_GEL_NEW in caslib public.
117  run;
118  quit;


121  /* Load CAS from Parquet data-folder ( without  extension .parquet) */
122  proc casutil incaslib="AWSCAS1"  outcaslib="public";
124!     load casdata="PARQUET/baseball_prqt" casout="baseball_prqt" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
NOTE: Cloud Analytic Services made the file PARQUET/baseball_prqt available as table BASEBALL_PRQT in caslib public.

129  /* Load CAS from single Parquet data file ( without extension .parquet) */
130  proc casutil incaslib="AWSCAS1"  outcaslib="public";
using session MYSESSION.
131      droptable casdata="baseball_prqt_1" incaslib="public"  quiet;
NOTE: Executing action 'table.dropTable'.

132!     load casdata="PARQUET/baseball_prqt/baseball_prqt_1" casout="baseball_prqt_1" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file PARQUET/baseball_prqt/baseball_prqt_1 available as table BASEBALL_PRQT_1 in caslib 
      public.
....
..............
.................

 

Result Output:

 

CAS table loaded from S3 parquet data files.

 

uk_7_CAS_Accesing_S3_Datafiles_7.png

 

CAS table saved to S3 parquet data files.

 

uk_8_CAS_Accesing_S3_Datafiles_8.png

 

CAS access to image and video data files

 

With valid AWS Access keys available to CAS Controller and Node Pods, the following code can be used to load the CAS from the S3 image and video data file. The code describes the CAS load from a subfolder containing the list of data files. The CAS table contains one row for each image and video file.

 

The data read from CAS to the S3 bucket are in parallel.The AWS access key is available to CAS Pods using a PVC “/mnt/viya-share/config/access-key/credentials" mounted to the NFS server.

 

When new data files are placed in the same S3 sub-folder, the CAS table must be reloaded. There is no auto incremental CAS load from delta data files.

 

The following screen describes the image and audio data files located at the S3 bucket which is being loaded to CAS using S3 type CASLIB.

 

uk_9_CAS_Accesing_S3_Datafiles_9.png

 

uk_10_CAS_Accesing_S3_Datafiles_10.png

 

Code:

 

%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_config_file="/mnt/viya-share/config/access-key/config"  ;
%let aws_credentials_file="/mnt/viya-share/config/access-key/credentials"  ;
%let aws_profile="182696677754-testers"  ;
%let aws_region="US_East";
%let objpath="/data/";

CAS mySession  SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

/* CASLIB to S3 bucket data folder */
caslib AWSCAS1 datasource=(srctype="s3",
                   awsConfigPath=&aws_config_file,
                   awsCredentialsPath=&aws_credentials_file ,
                   awsCredentialsProfile=&aws_profile,
                   region=&aws_region,
                   bucket=&s3bucket,
                   objectpath=&objpath
                   ) subdirs ;

/* CAS load from Img and Vdo files located at S3 location */
proc casutil incaslib="AWSCAS1" outcaslib="public" ;
   droptable casdata="IMGS_GEL" incaslib="public"  quiet;
   load casdata="" importOptions=(filetype="IMAGE" ) casout="IMGS_GEL" promote;
   droptable casdata="VDO_GEL" incaslib="public"  quiet;
   load casdata="" importOptions=(filetype="VIDEO" ) casout="VDO_GEL" promote;
run; quit ;


proc casutil incaslib="public"  outcaslib="public";
list tables ;
run;quit;

CAS mySession  TERMINATE;

 

Log extract :

 

....
..............

99   /* CAS load from Img and Vdo files located at S3 location */
100  proc casutil incaslib="AWSCAS1" outcaslib="public" ;
NOTE: The UUID '713850ca-2e62-3044-9f6f-f37452684fcf' is connected using session MYSESSION.
101     droptable casdata="IMGS_GEL" incaslib="public"  quiet;
NOTE: Executing action 'table.dropTable'.
NOTE: Action 'table.dropTable' used (Total process time):
NOTE: The Cloud Analytic Services server processed the request in 0.02023 seconds.
102     load casdata="" importOptions=(filetype="IMAGE" ) casout="IMGS_GEL" promote;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made documents available as table IMGS_GEL in caslib public.


104     load casdata="" importOptions=(filetype="VIDEO" ) casout="VDO_GEL" promote;
NOTE: Cloud Analytic Services made documents available as table VDO_GEL in caslib public.

..............
.................

Result Output:

 

CAS table loaded from S3 image and video data files.

 

uk_11_CAS_Accesing_S3_Datafiles_11.png

 

CAS table with S3 image data, for each image file there is a row .

 

uk_12_CAS_Accesing_S3_Datafiles_12.png

 

   

Important Link: S3 Data Source Caslibs    

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎04-07-2022 02:59 PM
Updated by:
Contributors

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags