Important Performance Considerations When Moving
SAS Applications to the Amazon Cloud
Last Updated: 07JUL2018
Executive Summary
Any architecture that is chosen by a SAS customer to run their SAS applications requires:
a good understanding of all layers and components of the infrastructure,
an administrator to configure and manage the infrastructure
the ability to meet SAS’ requirements not just to run the software, but to also allow it to perform well.
UPDATE: Margaret presented a SAS Global Forum 2018 with updated information on this subject. Please refer to the information in this paper over what is in this post. Important Performance Considerations When Moving SAS to a Public Cloud (https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/1866-2018.pdf)
This paper will talk about important performance considerations for SAS 9 (both SAS Foundation and SAS Grid) and SAS Viya in the Amazon Cloud.
Many companies are deciding to push their data centers from on-premises to a public cloud, including their SAS applications. Because of this decision, SAS customers are asking if SAS can run in a public cloud. The short answer to that question is yes, SAS has been tested without errors in many of the public clouds. The more in-depth answer to that question is that they need to understand what their computing resource needs are before committing to move a SAS application that is performing ideally on-premises to any public cloud.
Performing a detailed assessment of the computing resources needed to support a customer’s SAS applications is required prior to their decision to go production in the public cloud. Existing customers, should use the information from the assessment to setup instances and storage as a proof of concept in the public cloud, before making the decision to move to the public cloud. It is important that they work closely with their IT team to examine all compute resources, including IO throughput, number of cores, amount of memory, and amount of physical disk space. It is also important to see how close their existing compute infrastructure is to meeting their SAS users’ current SLAs. The gathering of compute resource data can be done by monitoring the existing computer systems in their SAS infrastructure with external hardware monitor tools like nmon for Linux and AIX and perfmon for Windows. For new customers, this decision is more difficult since they and we may not know exactly how they will be using SAS and what their workload demands will be.
Once the above information is gathered, it should be used to determine what AWS EC2 instance is the best fit for the various SAS tiers. Information on the different types of AWS EC2 instances that are a good fit for SAS are listed below. One thing to mention that is not listed in the table below is that the maximum IO throughput you can get between the AWS EC2 instance and EBS storage is 50 MB/second/core. This is why using AWS EC2 instances without ephemeral storage is not a good idea since your customer would have to put SAS WORK/UTILLOC on this slow storage.
AWS Instance type
I3
I2
R4
R3
D2
Description
High I/O
High I/O
Memory Optimized
Memory Optimized
Dense-Storage
Intel Processor
E5-2686 v4
(Broadwell)
E5-2670 v2
(Ivy Bridge)
E5-2686 v4
(Broadwell)
E5-2670 v2
(Ivy Bridge)
E5-2676v3 (Haswell)
Adequate Memory
yes
yes
yes
Yes
yes
Network
up to 20 Gbit
up to 10 Gbit
up to 20 Gbit
up to 10 Gbit
up to 10 Gbit
Internal Storage
NVMe SSDs
SSDs
No (EBS-only)
SSD (limited)
HDD
Other
Only supported with RHEL 7.3 and you need to run "yum update kernel" to get the NVMe fix.
It must be determined that the instance has enough physical cores (vCPUs/2 – this is because the vCPUs are actually hyper threads). Also, the AWS-specific SAS SETINIT must be applied so that the SAS deployment can access the number of physical cores that have been licensed. SAS runs on physical cores, but not well on hyper threads because there are floating point unit sharing issues. In addition to provisioning cores (dividing the number of vCPUs by two), it must be determined that there is enough memory AND IO throughput forthe SAS applications. Here are some considerations for various SAS applications that should be well understand before choosing AWS EC2 instances:
SAS WORK and UTILLOC file systems need the most IO throughput, and the requirement is typically for much more IO throughput than can be achieved with EBS storage. EBS storage has a maximum of 50 MB/sec/core limitation. SAS WORK and UTILLOC file systems require a minimum 100 – 150 MB/sec/core depending on which SAS PROCEDURES are used. Therefore they should use ephemeral storage internal to the instance described in more detail below.
SAS Viya applications have a requirement for robust IO throughput, as well as the need for large amounts of memory. This is because CAS will page data pages to the storage device if there is not enough physical RAM to hold all the data files in memory. Slow IO throughput will greatly impact the performance of SAS Viya. Please note that SAS Viya is only available on Red Hat Enterprise Linux (RHEL) 6.7 or higher and 7.1 or higher and same releases of Oracle Enterprise Linux (OEL).
There are limited storage options available at Amazon. Here is a list of what works and doesn’t work.
Ephemeral storage consists of disks internal to the AWS EC2 instance. It must be configured as a local file system (RHEL – XFS or EXT4). All data on this storage will disappear with a reboot or restart of the AWS EC2 instance. We strongly suggest that you stripe all the Ephemeral disks together in a RAID0.
EBS storage is on NAS storage in the AWS infrastructure. Data on it will persist after a reboot or restart, but there is limited IO throughput to this storage. We strongly suggest that you use at least 4 (preferably 😎 ST1 EBI volumes striped together.
EFS storage is available from AWS. Unfortunately, this storage does not have the file locks required by SAS, so it cannot be used for any SAS files or binaries.
Intel Cloud Edition for Lustre File System storage is the only shared file system in Amazon that has been tested with SAS Grid Manager; however, the future of Luster is uncertain as Intel has contributed the Lustre code-base to the open source community, and no longer provides Intel-branded releases. Intel will provide support of Lustre for the next two years.
IO throughput is very crucial for SAS Foundation and SAS Grid deployments. Be sure to choose AWS EC2 instances that are designed for large sequential IO and not IOPS.
Currently, the best AWS EC2 instance to use with SAS Foundation, SAS Grid and SAS Viya is the I2 instance family. Please note the cost of I2 instances (http://www.ec2instances.info/), and recognize that some customers may want to go with a cheaper AWS EC2 instances. Before you suggest that they go to a different AWS EC2 instance, please make sure it can deliver equivalent IO throughput to the ephemeral storage for SAS WORK or Viya CASCACHE as the AWS EC2 I2 instances.
While we are talking about Amazon, there are several potentially inaccurate precepts regarding how SAS will run in AWS. Let’s discuss several of the more popular ones:
It will be cheaper to run SAS in the public cloud. If your customer only stands up the bare minimum number of cores and physical disk space to save on money, there is a strong possibility that they will not be able to obtain the throughput performance needed to maintain happy SAS users. More expensive and increased core counts for EC2 instances may be required to provision adequate IO bandwidth, depending on SLA agreements with their SAS users.
SAS will run faster in the public cloud than on-premises. Public clouds have not built their infrastructure, particularly network interconnects, to support high volumes of large sequential IO. In order to get the IO throughput needed, SAS applications will need to be spread across multiple EC2 instances that are connected to a shared file system that is designed to spread its data locations across multiple volumes – like Lustre does. Adequate IO throughput can be achieved with this infrastructure, but it can come with a larger price tag.
Administrators will not be needed with a move to the public cloud. It is true that public cloud does away with management of physical infrastructure; however, this “void” is replaced with management of the cloud architecture. Cloud Architects understand how to interact with the cloud infrastructure (IaaS) provider, including: provisioning infrastructure services, integrating cloud services with on-premises systems, and securing the applications, data, and systems running on that IaaS. It is also a myth that HA and DR are provided for free in the cloud. Implementing High Availability and Disaster Recovery for critical systems are additional responsibilities of a Cloud Architect. Administrators are also needed for the instance’s host operating system and relational databases. For example, your customer needs an operating system administrator to configure the file systems needed for use with SAS as well as to tune the operating system for ideal performance. Along with initial setup of the file systems, additional scripting needs to be done to both reconnect their permanent SAS data files and recreate their ephemeral storage after a restart/reboot of an instance.
SAS can take advantage of the bursting features of AWS. Deploying your SAS applications in Amazon does not mean that they are elastic and can leverage the AWS auto scaling capabilities. You should refer to SAS product documentation to determine if bursting is supported and how it is implemented.
Bottom line:
SAS customers may have to stand up more computer resources in the public cloud than their EEC sizing suggests and/or are in their on-premises system - more cores, more instances, and more physical disk space - in order to meet their SAS applications needs, especially from an IO throughput perspective.
References:
Performance and Tuning Considerations for SAS Grid Manager 9.4 on Amazon (AWS) Cloud using Intel Cloud Edition for Lustre File System (April 2015) http://support.sas.com/rnd/scalability/grid/SGMonAWS.pdf
Implementing SAS 9.4 Software in Cloud Infrastructures (July 2016)
https://support.sas.com/resources/papers/Implementing-SAS9-4-Software-Cloud-Infrastructures.pdf
Until RHEL is supported on AES EC2 I3 instances (April 2017) https://communities.sas.com/t5/SAS-Communities-Library/Until-RHEL-is-supported-on-AWS-EC2-I3-Instances/ta-p/354441
Contacts:
Margaret Crevar
Margaret.Crevar@sas.com
+1 919.531.7095
Ande Stelk
Ande.Stelk@sas.com
+1 919.531.9984
... View more