In Part 1 of this article, I talked about mounting an S3 bucket to CAS and SAS servers as a Network File System. Mounting an AWS S3 bucket as a filesystem means you can use all your existing tools to interact with S3 bucket to perform read and write operations on files and folders. This article describes another method to mount S3 as NFS, offered by Amazon called “AWS Storage Gateway.” With storage gateway server mounted to CAS and SAS server you can load data files from S3 bucket to CAS, save a CAS table to S3 bucket, and access S3 data files in base SAS.
AWS Storage Gateway is a hybrid storage service that enables your on-premises applications to seamlessly use AWS cloud storage. You can use the service for backup, cloud data processing, storage tiering, disaster recovery etc. Your applications connect to the service through a virtual machine or hardware gateway appliance using standard storage protocols, such as NFS, SMB and iSCSI. The gateway connects to AWS storage services, such as Amazon S3, Amazon Glacier, Amazon EBS, and AWS Backup, providing storage for files, volumes, snapshots, and virtual tapes in AWS. The service includes a highly-optimized data transfer mechanism, along with a local cache for low-latency on-premises access to S3 bucket.
The AWS Storage Gateways service is a chargeable service. You can host the Storage Gateways service at the on-premise server or you can use EC2 server. If you use EC2 server to host, AWS will charge for EC2 Server, Disk space (min 170 GB) and Network traffic from S3 bucket to Gateway server to on-premise server.
The following steps describe how to start an AWS Storage Gateway, configure an AWS file share with S3 bucket, and mount to SAS and CAS server to access the S3 bucket.
The AWS Storage Gateway instance can be created from the AWS console with required (admin) permission. On AWS services tab under Storage section look for “Storage Gateway”. On selection of “Storage Gateway”, it will take you to “get started” page, and then to "create a Gateway type". Select “File Gateway” and move onto the next step where you an option to select a host platform. You can download the VMware for listed host platform to host the services at the on-premise server or you can choose to host on an AWS EC2 server. When you select the EC2 server, you can lunch the EC2 instance from the predefined image provided by Amazon. Once EC2 server is up and running come back to the same page with Ip address of EC2 server for the next step.
Following screenshots describe the creation of an AWS Storage Gateway.
The Storage Gateways services must be hosted at a high-end server. The minimum requirement is a general purpose M4 X Large type EC2 server with an additional 150 GB disk space (SSD gp2) for the cache. The EC2 instance must be started from the Gateway service window itself to start with a predefined Image. You can keep the default network and subnet selection or select network as per your location. Create a network security group which should allow traffic from your web browser and from your corporate domain to the Gateway EC2 server. You can configure the security as per your policy. Once the instance is up and running, note down the Public Ip and return to the create AWS Storage Gateway window.
AWS requirement for Storage Gateway
The following screen shots describe the starting of an EC2 server with required component. This is not the complete steps but the steps where you need to be pay attention.
Once an AWS Storage Gateway service is up and running configure an AWS file share with S3 bucket having data files. An AWS File share points to an S3 bucket and object from bucket can be accessed using NFS or SMB method. The AWS File share support NFS mount statement for the different operating system (Linux, MacOS, and Win ). The following step describes the configuration of an AWS File share using an S3 bucket and AWS Gateway.
$ mkdir /opt/sas/s3nfsmnt
$ chown sas:sas /opt/sas/s3nfsmnt
$ chmod 755 /opt/sas/s3nfsmnt
$ mount -t nfs -o nolock,hard 172.31.91.0:/gelsas /opt/sas/s3nfsmnt
[ec2-user@ip-172-31-32-37 s3nfsmnt]$ pwd
/opt/sas/s3nfsmnt
[ec2-user@ip-172-31-32-37 s3nfsmnt]$ ls -l
total 4979
-rw-rw-rw-. 1 nfsnobody nfsnobody 968368 Feb 1 15:58 order_fact.sas5bdat
-rw-rw-rw-. 1 nfsnobody nfsnobody 917504 Feb 1 15:58 customers.sas7bdat
-rw-rw-rw-. 1 nfsnobody nfsnobody 262144 Feb 1 15:58 prdsale.sas7bdat
[ec2-user@ip-172-31-32-37 s3nfsmnt]$
After mounting S3 bucket to SAS compute server, the data files can be accessed from BASE SAS using LIBNAME and FILENAME statement. The following example describes the S3 bucket data file access from base SAS using a LIBNAME statement.
libname mylib "/opt/sas/s3nfsmnt" ;
proc print data=mylib.customers ;
run;
After mounting S3 bucket to CAS Controller, the data files can be accessed using path-based CASLIB. The following code example describes the data load to CAS from an S3 bucket and data save from CAS to an S3 bucket via AWS Storage Gateway server. The newly saved object/files at S3 bucket can be accessed using AWS UI and CLI. Other SAS application can also use the newly saved .sas7bdat and .csv files.
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US");
caslib caslibs3 datasource=(srctype="path") path="/opt/sas/s3nfsmnt" ;
/* load a S3 data file to CAS */
PROC CASUTIL incaslib="caslibs3" outcaslib="caslibs3";
droptable casdata="prdsale" quiet;
LOAD casdata="prdsale.sas7bdat" CASOUT="prdsale" copies=0
importoptions=(filetype="basesas", dtm="auto", debug="dmsglvli");
RUN;
quit;
/* Save a CAS table to S3 with .sashdat extension */
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
save casdata="prdsale" casout= "prdsale_new" replace ;
run;
quit;
/* Save a CAS table to S3 with .sas7bdat extension */
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
save casdata="prdsale" casout= "prdsale_new.sas7bdat" replace ;
run;
quit;
/* Save a CAS table to S3 with .csv extension */
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
save casdata="prdsale" casout= "prdsale_new.csv" replace ;
run;
quit;
/* load a .sashdat file from S3 to CAS */
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
droptable casdata="prdsale_new_hdat" quiet;
load casdata="prdsale_new.sashdat" casout="prdsale_new_hdat" ;
run;
quit;
/* load a .csv file from S3 to CAS */
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
droptable casdata="prdsale_new_csv" quiet;
load casdata="prdsale_new.csv" casout="prdsale_new_csv" ;
run;
quit;
proc casutil;
list tables incaslib="caslibs3 ";
list files incaslib="caslibs3 ";
run;
/* Shutdown CAS Session */
CAS mySession TERMINATE;
Result extracts from the above code execution.
The newly saved data files from CAS to S3 are also accessible from BASE SAS.
Important link: Creating an AWS Gateway and File Share
Thank you for a wonderful article.
Uttar, thanks for all 3 parts. You did not include any I/O comparisons for Storage Gateway. In your opinion - is that the best option out of the 3?
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.