Aim of the post
Thanks to its great connectivity features and licensing providing users with lots of different Data Connector out of the box SAS CAS gives users a plethora of different opportunities to use for storage, especially in the cloud.
In this post I’d like to share some experiences in selecting a storage option for a SAS solution. Firstly, I’m going to cover the use case, outline which options were considered and then go through the most important criteria.
Use case
My use case was an environment used for supply chain optimization for a retail chain. From the technical point of view SAS was running a complex workload nearly round the clock in three main areas. There is a lot of data preparation that was done in SAS to relieve the strain on the Data Warehouse and minimize the impact of analytical calculations. Apart from that we are running both prediction and optimization model scoring as well as training.
Server is on a very tight schedule – once the data is ready it immediately starts to be loaded and transformed before a relatively quick forecast followed by a long and arduous optimization to meet all of the client’s criteria. The hours left before the next batch are used to retrain or fine-tune the models.
The whole infrastructure is to be put on Amazon Web Services (AWS) infrastructure. Consequently, SAS Viya will run on AWS EKS. Calculations were done in CAS on multiple workers and orchestrated using 4GL in SAS Compute.
Criteria
We need a storage solution that is first and foremost performant while also being cost efficient. Other criteria that can’t be overlooked are durability and security.
Option I – EFS
The first and easiest option is to use AWS Elastic File System. Its advantages lie mainly in the simple management: it practically takes care of itself and has many ways to connect it to your EKS cluster.
Setup
To setup EFS you need to perform three steps:
Provision cloud resources
resource "aws_efs_file_system" "viya-rwx" {
creation_token = "sas-viya-rwx"
}
resource "aws_efs_mount_target" "rwx-to-workers" {
file_system_id = aws_efs_file_system.viya-rwx.id
subnet_id = aws_subnet.workers.id
}
Add CSI Driver
eksctl create addon --name aws-efs-csi-driver --cluster $cluster --attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEFSCSIDriverPolicy --force
While sufficient for a testing scenario you might want to write your own policy for production-grade environments by restricting the “Resource” section of the JSON.
Create a Storageclass
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
parameters:
provisioningMode: efs-ap
fileSystemId: {{ efs_filesystem_id }}
directoryPerms: "700"
gidRangeStart: "1000" # optional
gidRangeEnd: "2000" # optional
basePath: "/dynamic_provisioning" # optional
Durability
Data durability is high and easy to determine by simply checking AWS documentation: Amazon Elastic File System (EFS) | Cloud File Storage | FAQs. Cloud provider guarantees 11 nines over a given year.
Security
From the security point of view EFS also delivers. It provides encryption in-transit which is turned on when filesystem is mounted through the CSI driver with default configuration. It also provides encryption at rest with both AWS and customer managed keys (in that scenario keep in mind additional permissions you have to include for the service account).
Access to the disk is granted using IAM permissions.
Cost efficiency
Every TB costs roughly $300 per month per TB. The cost can be greatly reduced using Infrequent Access, which is less than 1/10th of the cost of Standard. What’s more is that we’d have to pay around $600 for 100MB/s of provisioned throughput or use Bursting Throughput (see Performance chapter).
Management
This is a managed service, so no additional management is required.
Performance
EFS is capable of providing 3 GiB/s throughput for a cluster of 6 CAS Workers. This is more than enough for our purposes.
What also looks promising is using Bursting Throughput mode which scales with the amount of storage and is capable of providing extra performance when under used. This sounds very promising as some of our calculations take place in memory and for those hours bursting credits might accommodate. BT delivers 50 MiBps per TiB of storage baseline with twice as much in bursts. With our 100TB volume we might get truly impressive performance.
Option II – S3
Setup
In this point I’d like to walk you through creating an S3 bucket with no public connectivity and used mainly by batch processes, so no user auditing in S3 logs. As in previous example for production environment it is advisable to restrict persmissions for the sas-cas-server service account to specific buckets. To setup it in the simplest way you need to perform these steps:
Provision S3 bucket with access control
resource "aws_s3_bucket" "viya_object_storage" {
bucket = "viya-object-storage"
force_destroy = true
}
resource "aws_s3_bucket_ownership_controls" "viya-ownctr" {
bucket = aws_s3_bucket.viya_object_storage.id
rule {
object_ownership = "ObjectWriter"
}
}
resource "aws_s3_bucket_acl" "viya-s3acl" {
depends_on = [aws_s3_bucket_ownership_controls.viya-ownctr]
bucket = aws_s3_bucket.viya_object_storage.id
acl = "private"
}
Create private connectivity
resource "aws_vpc_endpoint" "s3endp" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.${local.region}.s3"
}
resource "aws_vpc_endpoint_route_table_association" "to_s3_vpc_endpoint" {
route_table_id = aws_route_table.workers_subn_rtbl.id
vpc_endpoint_id = aws_vpc_endpoint.s3endp.id
}
Provide credentials to CAS server
eksctl create iamserviceaccount --cluster=sas-viya --namespace viya --name=sas-cas-server --role-only --role-name=s3fa-for-cas --attach-policy-arn=arn:aws:iam::aws:policy/AmazonS3FullAccess
kubectl annotate sa sas-cas-server eks.amazonaws.com/role-arn=arn:aws:iam::<ACCOUNT_NO>:role/s3fa-for-cas
Restart CAS and create a CASLIB
caslib S3D datasource=(srctype="s3",
region='eu-north-1',
bucket='viya-object-storage',
objectpath='/'
) subdirs ;
Durability
AWS guarantees durability to be 11 nines over a given year, so the same value as EFS.
Security
S3 provides both encryption in transit and at-rest using AWS and customer managed keys. Same remarks as for EFS apply.
Access to the disk is granted using IAM permissions.
Cost efficiency
AWS charges $0.023 per month for every GiB stored in its Standard Tier dropping to $0.01 for Infrequent Access. This is significantly cheaper than EFS. It has to be notes though that with S3 you also pay per request and in some tiers by GB retrieved.
Management
This is a managed service, so no additional management is required.
Option III – NFS
Setup
NFS setup will not be provided here as it is provided in kernel space in Viya4-IAC scripts. It has to be noted, however that in that configuration it has to be put on an external EC2 instance or at least on am AMI that has all required kernel modules loaded. It not only bumps the management effort but also cost, as instance size determines storage throughput and consequently we might be forced to pay for computing power we can't really use.
To circumvent that I've created a userspace in-cluster NFS solution that can still use fast RAID0 disk array and serve NFS from within the cluster. The implementation of this solution is beyond the scope of this post but don't hesitate to contact me for details.
Durability
In this example durability has to be provided by the filesystem driver and it might be a costly one. Since we are aiming for maximum performance we used RAID0 for parallel write and reads which means a discrepancy between disks might lead to data corruption. Disk snapshots also have to take place when no pod is mounting the disks for sake of simultaneity.
Security
While we can use data encryption at rest and rely on customer managed keys it is much more complex with data in transit. As such, we would need another layer to encypt NFS data that is being transmitted between the pods. For many scenarios that might not be a problem but it has to be taken into consideration while evaluating compliance.
Cost efficiency
In EBS you pay about $0.10 for every gigabyte when it’s attached to an instance. Consequently, to provision 100 TB of storage not only would we need 64 disks 16 TB each but for a 24/7 installation it would cost us north of $100k a month! And we haven’t even considered the cost of any additional instances for an external NFS...
Conclusion
As of now we definitely ruled out using NFS for our storage because while it in theory might provide higher throughput but it would in fact be more than our instances need while also compromising by more management effort and cost.
With the current state of affairs it seems reasonable to put more frequently accessed data on the EFS solution while archiving in EFS/S3.
... View more