Accessing AWS S3 as NFS from CAS and SAS – Part 2

1 Like

In Part 1 of this article, I talked about mounting an S3 bucket to CAS and SAS servers as a Network File System. Mounting an AWS S3 bucket as a filesystem means you can use all your existing tools to interact with S3 bucket to perform read and write operations on files and folders. This article describes another method to mount S3 as NFS, offered by Amazon called “AWS Storage Gateway.” With storage gateway server mounted to CAS and SAS server you can load data files from S3 bucket to CAS, save a CAS table to S3 bucket, and access S3 data files in base SAS.

AWS Storage Gateway

AWS Storage Gateway is a hybrid storage service that enables your on-premises applications to seamlessly use AWS cloud storage. You can use the service for backup, cloud data processing, storage tiering, disaster recovery etc. Your applications connect to the service through a virtual machine or hardware gateway appliance using standard storage protocols, such as NFS, SMB and iSCSI. The gateway connects to AWS storage services, such as Amazon S3, Amazon Glacier, Amazon EBS, and AWS Backup, providing storage for files, volumes, snapshots, and virtual tapes in AWS. The service includes a highly-optimized data transfer mechanism, along with a local cache for low-latency on-premises access to S3 bucket.

The AWS Storage Gateways service is a chargeable service. You can host the Storage Gateways service at the on-premise server or you can use EC2 server. If you use EC2 server to host, AWS will charge for EC2 Server, Disk space (min 170 GB) and Network traffic from S3 bucket to Gateway server to on-premise server.

The following steps describe how to start an AWS Storage Gateway, configure an AWS file share with S3 bucket, and mount to SAS and CAS server to access the S3 bucket.

Create an AWS Storage Gateway.

The AWS Storage Gateway instance can be created from the AWS console with required (admin) permission. On AWS services tab under Storage section look for “Storage Gateway”. On selection of “Storage Gateway”, it will take you to “get started” page, and then to "create a Gateway type". Select “File Gateway” and move onto the next step where you an option to select a host platform. You can download the VMware for listed host platform to host the services at the on-premise server or you can choose to host on an AWS EC2 server. When you select the EC2 server, you can lunch the EC2 instance from the predefined image provided by Amazon. Once EC2 server is up and running come back to the same page with Ip address of EC2 server for the next step.

Following screenshots describe the creation of an AWS Storage Gateway.
- Select the AWS service Storage Gateway.
- Select the gateway type as File Gateway.
- Select the host platform as “Amazon EC2” and launch the instance.
- Starting an EC2 Instance Type
  
  The Storage Gateways services must be hosted at a high-end server. The minimum requirement is a general purpose M4 X Large type EC2 server with an additional 150 GB disk space (SSD gp2) for the cache. The EC2 instance must be started from the Gateway service window itself to start with a predefined Image. You can keep the default network and subnet selection or select network as per your location. Create a network security group which should allow traffic from your web browser and from your corporate domain to the Gateway EC2 server. You can configure the security as per your policy. Once the instance is up and running, note down the Public Ip and return to the create AWS Storage Gateway window.
  
  AWS requirement for Storage Gateway
  
  The following screen shots describe the starting of an EC2 server with required component. This is not the complete steps but the steps where you need to be pay attention.
- Provide the IP address collected form EC2 instance started in the previous step at the Connect Gateway window to connect to Gateway.
- Activate the gateway with a name (e.g. utkuma)
- Save the Gateway with disk space used for data file cache.
Create an AWS File Share.

Once an AWS Storage Gateway service is up and running configure an AWS file share with S3 bucket having data files. An AWS File share points to an S3 bucket and object from bucket can be accessed using NFS or SMB method. The AWS File share support NFS mount statement for the different operating system (Linux, MacOS, and Win ). The following step describes the configuration of an AWS File share using an S3 bucket and AWS Gateway.
- Configure an AWS file share using S3 bucket and AWS Gateways.
- During AWS File share review step add the IP address of client (CAS and SAS) servers from where you are going to access the data file.
- Once the AWS File Share is available, use the NFS mount statement to access S3 data files at CAS and SAS server.

Mount an AWS File share to CAS and SAS server.

Create a folder (e.g. /opt/sas/s3nfsmnt ) to mount an AWS File share consists of an S3 location.
```
$ mkdir /opt/sas/s3nfsmnt
$ chown sas:sas /opt/sas/s3nfsmnt
$ chmod 755 /opt/sas/s3nfsmnt
```
Mount the File Share S3 bucket with CAS/SAS local folder as a filesystem by using statement provided at AWS File Share window.
```
$ mount -t nfs -o nolock,hard 172.31.91.0:/gelsas /opt/sas/s3nfsmnt
```

List the files from S3 bucket at CAS and SAS Server.

[ec2-user@ip-172-31-32-37 s3nfsmnt]$ pwd
/opt/sas/s3nfsmnt
[ec2-user@ip-172-31-32-37 s3nfsmnt]$ ls -l
total 4979
-rw-rw-rw-. 1 nfsnobody nfsnobody 968368 Feb 1 15:58 order_fact.sas5bdat
-rw-rw-rw-. 1 nfsnobody nfsnobody 917504 Feb 1 15:58 customers.sas7bdat
-rw-rw-rw-. 1 nfsnobody nfsnobody 262144 Feb 1 15:58 prdsale.sas7bdat
[ec2-user@ip-172-31-32-37 s3nfsmnt]$

S3 Bucket data file access from BASE SAS.

After mounting S3 bucket to SAS compute server, the data files can be accessed from BASE SAS using LIBNAME and FILENAME statement. The following example describes the S3 bucket data file access from base SAS using a LIBNAME statement.
```
libname mylib "/opt/sas/s3nfsmnt" ;
proc print data=mylib.customers ; 
run;
```

S3 Bucket data file access from CAS.

After mounting S3 bucket to CAS Controller, the data files can be accessed using path-based CASLIB. The following code example describes the data load to CAS from an S3 bucket and data save from CAS to an S3 bucket via AWS Storage Gateway server. The newly saved object/files at S3 bucket can be accessed using AWS UI and CLI. Other SAS application can also use the newly saved .sas7bdat and .csv files.

CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US");
caslib caslibs3 datasource=(srctype="path") path="/opt/sas/s3nfsmnt" ;

/* load a S3 data file to CAS */ 
PROC CASUTIL incaslib="caslibs3" outcaslib="caslibs3";
 droptable casdata="prdsale" quiet; 
 LOAD casdata="prdsale.sas7bdat" CASOUT="prdsale" copies=0
 importoptions=(filetype="basesas", dtm="auto", debug="dmsglvli"); 
RUN;
quit;

/* Save a CAS table to S3 with .sashdat extension */ 
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
 save casdata="prdsale" casout= "prdsale_new" replace ;
run;
quit;

/* Save a CAS table to S3 with .sas7bdat extension */ 
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
 save casdata="prdsale" casout= "prdsale_new.sas7bdat" replace ;
run;
quit;

/* Save a CAS table to S3 with .csv extension */ 
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
 save casdata="prdsale" casout= "prdsale_new.csv" replace ;
run;
quit;

/* load a .sashdat file from S3 to CAS */ 
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
 droptable casdata="prdsale_new_hdat" quiet;
 load casdata="prdsale_new.sashdat" casout="prdsale_new_hdat" ; 
run;
quit;

/* load a .csv file from S3 to CAS */ 
proc casutil incaslib="caslibs3" outcaslib="caslibs3";
 droptable casdata="prdsale_new_csv" quiet;
 load casdata="prdsale_new.csv" casout="prdsale_new_csv" ; 
run;
quit;

proc casutil;
 list tables incaslib="caslibs3 ";
 list files incaslib="caslibs3 ";
run;

/* Shutdown CAS Session */
CAS mySession TERMINATE;

Result extracts from the above code execution.

The newly saved data files from CAS to S3 are also accessible from BASE SAS.

Summary

The AWS Storage Gateway enables you to access the S3 bucket data file from CAS and SAS servers.
The NFS mount from CAS and SAS server to Storage Gateway server is a hard mount.
The Storage Gateway server uses cache area to download the S3 data files to improve the latency.
You can host the AWS Storage Gateway service at on-premise server or at high-end EC2 server.

Important link: Creating an AWS Gateway and File Share

yevgeniyelbert · ‎06-26-2020

Thank you for a wonderful article.

yevgeniyelbert · ‎10-16-2020

Uttar, thanks for all 3 parts. You did not include any I/O comparisons for Storage Gateway. In your opinion - is that the best option out of the 3?