Last Updated: 10DEC2020
When considering deploying SAS 9.4 (SAS GRID or SAS Analytics Pro) in the Microsoft (MS) Azure Cloud, Azure NetApp Files (ANF) is a viable primary storage option for SAS GRID clusters of limited size. Given the 100MiB/s throughput per physical core SAS recommendation, SAS GRID clusters using an ANF volume for SASDATA (persistent SAS data files) are scalable to 32 physical cores across two or more MS Azure machine instances. Cluster sizes are predicated upon the architectural constraint of a single SASDATA namespace per SAS cluster and the available single Azure NetApp Files volume bandwidth. The core count guidance will continuously be revisited as Azure infrastructure (compute, network and per file system storage bandwidth) increases over time.
Testing has been completed using volumes accessed against Azure NetApp Files via NFSv3. SAS does not recommend NFSv4.1 currently for SAS Grid deployments.
It is the intent of this paper to provide sizing guidance and set proper expectations to be successful when deploying Azure NetApp Files as the storage behind SAS 9.4.
It has been tested and documented that a single Azure NetApp Files volume can deliver up to 4,500MiB/s of reads and 1,500MiB/s of writes. Given an Azure instance type with sufficient egress bandwidth, a single virtual machine can consume all the write bandwidth of a single Azure NetApp Files volume. With that said, no single virtual machine, regardless of virtual machine SKU can consume all the read bandwidth of a single volume.
The main shared workload of SAS 9.4 – SASDATA – has an 80:20 read:write ratio and as such the important per volume numbers to know are:
The throughput numbers quoted above can be seen in the aforementioned documentation under NFS scale out workloads – 80:20.
Please note, SASWORK (temporary SAS data files) that have a 50:50 read:write ratio should not be placed on Azure NetApp Files volumes at this time.
As SAS stated in the Best Practices for Using MS Azure with SAS paper, the E64-16ds_v4 and E64-32ds_v4 MS Azure instances are recommended for SAS 9 providing the best overall SAS experience. Based on this, the following Azure NetApp Files relevant performance guidance is provided at a high level:
Red Hat Enterprise Linux, (RHEL) is the distribution of choice for SAS customers when running SAS 9 on Linux. Each of the kernels supported by Red Hat have their own unique bandwidth constraints in and of themselves when using NFS.
Testing has shown that a single RHEL 7 instance is expected to achieve no more than roughly 750-800MiB/s of read throughput against a single storage endpoint (i.e. against a network socket) while 1500MiB/s of writes are achievable against the same, using a 64KiB rsize and wsize mount options. There is evidence the aforementioned read throughput ceiling is an artifact of the 3.10 kernel. Refer to RHEL CVE-2019-11477 for detail.
Testing has shown that a single RHEL 8.2 instance with its 4.18 kernel is free of the limitations found in the 3.10 kernel above, as such 1200-1300MiB/s of read traffic using a 64KiB rsize and wsize mount option is achievable. With that said, expect the same 1500MiB/s of achievable throughput as seen in RHEL7 for large sequential writes.
A single RHEL 8.3 instance has not yet been tested by SAS for SAS 9.4 workloads, with that said it is understood that with the nconnect mount option new to the RHEL8.3 distribution somewhere around 3,000MiB/s reads throughput are achievable from a single Azure NetApp Files volume is likely. Expect no more than 1,500MiB/s of writes to a single volume even with nconnect.
SAS recommends the following NFS mount commands for NFS shared file systems being used for permanent SAS DATA files:
Please see the following Azure NetApp Files performance calculator for guidance when sizing SASDATA volumes. As volume bandwidth is based upon volume capacity, and as capacity cost is based upon which service level is selected, and as service level selection is based upon capacity versus bandwidth needs, determining which service level can be somewhat complicated on your own. Using this calculator, enter data as follows:
The readout at the bottom of the screen advises capacity requirements at each service level and the cost per month thereof.
Note: that the user experience will be the same regardless of which service level is selected.
Further control costs using the concept of volume shaping with Azure NetApp Files. Two dynamic options are available to customers to influence performance and cost.
Mellanox drivers are used for accelerated networking on Azure Virtual machines. Mellanox recommends pinning network interfaces to the lowest numerical NUMA node to get the best possible user experience. In general, NIC’s should be pinned to NUMA node 0, in Azure, considering hypervisor logic, that may not be the ideal configuration. The Eds_v4 SKU has been found in general only nominally susceptible to this issue, with that said benefits may be found pining the accelerated networking interface the most appropriate NUMA node.
In addition to remapping the accelerated networking interface, additional benefit may be found in setting the number of tx/rx queues for the accelerated networking interface to no more than the number of logical cores associated with a single NUMA node. By setting the tx/rx queue count equal to or less than the core count per NUMA node, you avoid queue selection wherein the queue is not resident to the most appropriate NUMA node. On some systems, the maximum tx/rx queue count is less than the core count for a single NUMA node – only on these systems should the tx/rx queue count be set less than the logical core count per NUMA node.
Please download the following code written by Azure engineering to correctly identify and set the accelerated networking interface and tx/rx queue count on each worker node. This script should be executed on VM start up. All that is required is that at least one NFS mount is already in place.
The script is run as such: ./set_best_affinity.ksh </nfsmount/file>
RPC Slot Table refers to the maximum allowed threads on a single TCP connection that is allowed by the NFS client and server. These values are controlled through sunrpc configuration on NFS clients. The latest NFS client versions default to a dynamic slot table value of 65536, which means that the client attempts to use as many slot tables as it can in a single TCP connection. Azure NetApp Files, however, supports 128 slot tables per TCP connection. If a client exceeds that value, Azure NetApp Files enacts NAS flow control and pauses client operations until resources are freed up. As a best practice, set the slot table values on NFS clients to a static value no higher than 128. To ensure the best possible SAS storage performance with the NFSv3 protocol, please add the following tunables to /etc/sysctl.conf and then update sysctl by running sysctl -a
Run the above before mounting NFS volumes.
NOTE: The slot table values used above may not be optimal for the RHEL8.3 clients when using the nconnect NFS mount option. More information on this tuning parameter for RHEL8.3 will come later.
Use Proximity Placement Groups to co-locate your Azure instances within the same data center and under a common router to:
NOTE: At this time, Proximity Placement Groups have no bearing on the relative location of Azure Instances compared to Azure NetApp Files storage, updates will be made should this change.
Thank you @MargaretC for sharing.
In the FAQ, I could read:
Azure NetApp Files can support any POSIX-compliant workload that requires shared file storage. The service is offered in three performance tiers to fit your workloads: Standard for static web content, file shares and database backups; Premium, comparable to mainstream SSD performance and suitable for databases, enterprise apps, analytics, technical applications and messaging queues; and Ultra for most performance-intensive applications such as those in high-performance computing
Therefore, my question would be: what is the version or versions that have been tested by SAS? What was the issue/bottleneck for the poor Write performance ?
@JuanS_OCS I am chasing the first question with the guys from Microsoft and NetApp that setup the storage for our test.
As for the second question, the issue with NFS has to do with how NFS was designed - to write once read often. When you start doing lots of writes to an NFS file system, the NFS metadata cannot keep up with the writes. And it takes seconds (not nanoseconds) for the file you just wrote to become available in the file system. With SAS, that means one step creates a file, and the next step that wants to use the file is told by the file system the file is not there. Does this help?
@JuanS_OCS This is what I got back from Microsoft and NetApp:
"SAS conducted performance testing against the Ultra Tier. We did create pools for standard and premium tier during our testing for potential lower performance use cases and/or data at rest that is not being processed, but did not test the performance. We just tested the Ultra Tier. The Ultra Tier achieves the highest throughput possible.
Note, these Tiers use all the same storage media under the hood (flash). The performance is controlled by throttling the tiers by IO/throughput density. This ensures a customer selected capacity has a reserved amount of throughput and prevents resource contention.
Another thing to note, throughput scales linearly as you increase the size of the volume holding constant the latency. As such, you increase the capacity during production jobs during the day (or whatever timeframe SAS jobs need to be run) to accommodate greater throughput, you decrease the capacity during non-production down to the data utilization and shed costs. This dynamic scale up and scale down capability to control performance and costs takes seconds and is non-disruptive to the SAS Grid nodes. "
Hope this helps.
Additional testing of SAS using ANF with RHEL 8.3, NCONNECT= and NFS readahead increased has provided enhanced performance. Here are the changes to the testing we did in 2020.
–50 TB file system
–Ultra service level
–Nconnect Mount Option for NFS volume: 8
–Set NFS readahead to 15 MB
–Kernel Tunables (via /etc/sysctl.conf)
Kernel Tunables (via custom tuned profile)
–Accelerated network tuning (https://github.com/ANFTechTeam/accelnet_tuning)
Mixed Analytics Workload
Putting SASDATA and SAS WORK on ANF
The 1-node test the ratio is 0.82
The 2-node test the average ratios is 0.67
The 3-node test the average ratios is 0.57
The 1-node test gave us 98 MB per second per physical core for WRITEs and 200MB per second per physical core for READs.
The 2-node test gave us 53 MB per second per physical core for WRITEs and 136 MB per second per physical core for READs.
The 3-node test gave us 37 MB per second per physical core for WRITEs and 86 MB per second per physical core for READs
With the above changes in addition to the ones in the original post, you should be able to achieve the 100MiB/s throughput per physical core SAS recommendation for SAS GRID clusters using an ANF volume for SASDATA (persistent SAS data files) are scalable to 48 physical cores across two or more MS Azure machine instances.
Here's some detailed instructions on how to configure NFS readahead (on RHEL 8.3) as this might not be obvious.
Please note that the Kernel Tunables parameter below needs to be 128, not 512 like is mentioned above.
–Kernel Tunables (via /etc/sysctl.conf)
Setting this value greater than 128 will not increase performance of the system. ONTAP limits each connection to 128 maximum outstanding operations, the tcp_max_slot_table_entries tunable defines how many slots are available per connection. By limiting this setting to 128 matching the concurrency supported by ONTAP, we actually stand to improve performance. If you are curious, please read this study published in June 2021.
are these tuning recommendations and settings transferable to an on-prem netapp?
Is SASWORK on NFS now a supported and recommended option? In this post of yours from end 2019 that did not seem to be the case: "As a reminder, SAS WORK and SAS UTILLOC should never be placed on NFS storage."
Last, can you elaborate on why SAS does not recommend NFSv4.1 currently for SAS Grid deployments?
Hello community, I would like to ask you about the proper procedure for deleting the NetApp files configuration after testing it on SAS Viya. I am looking for a safe method to delete the NetApp account and volume on Azure without causing any errors in SAS Viya.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.
Find more tutorials on the SAS Users YouTube channel.