Solved: Azure NetApp Files: A shared file system to use with SAS Grid on MS Az...

MargaretC · Posted 08-06-2019 12:15 PM

Last Updated: 10DEC2020

When considering deploying SAS 9.4 (SAS GRID or SAS Analytics Pro) in the Microsoft (MS) Azure Cloud, Azure NetApp Files (ANF) is a viable primary storage option for SAS GRID clusters of limited size. Given the 100MiB/s throughput per physical core SAS recommendation, SAS GRID clusters using an ANF volume for SASDATA (persistent SAS data files) are scalable to 32 physical cores across two or more MS Azure machine instances. Cluster sizes are predicated upon the architectural constraint of a single SASDATA namespace per SAS cluster and the available single Azure NetApp Files volume bandwidth. The core count guidance will continuously be revisited as Azure infrastructure (compute, network and per file system storage bandwidth) increases over time.

Testing has been completed using volumes accessed against Azure NetApp Files via NFSv3. SAS does not recommend NFSv4.1 currently for SAS Grid deployments.

It is the intent of this paper to provide sizing guidance and set proper expectations to be successful when deploying Azure NetApp Files as the storage behind SAS 9.4.

Azure NetApp Files Per Volume Expectations:

It has been tested and documented that a single Azure NetApp Files volume can deliver up to 4,500MiB/s of reads and 1,500MiB/s of writes. Given an Azure instance type with sufficient egress bandwidth, a single virtual machine can consume all the write bandwidth of a single Azure NetApp Files volume. With that said, no single virtual machine, regardless of virtual machine SKU can consume all the read bandwidth of a single volume.

The main shared workload of SAS 9.4 – SASDATA – has an 80:20 read:write ratio and as such the important per volume numbers to know are:

80:20 workload with 64K Read/Write: 2,400MiB/s of read throughput and 600MiB/s of write throughput running concurrently (~3,000MiB/s combined)

The throughput numbers quoted above can be seen in the aforementioned documentation under NFS scale out workloads – 80:20.

Please note, SASWORK (temporary SAS data files) that have a 50:50 read:write ratio should not be placed on Azure NetApp Files volumes at this time.

Which Version of RHEL should be used?

As SAS stated in the Best Practices for Using MS Azure with SAS paper, the E64-16ds_v4 and E64-32ds_v4 MS Azure instances are recommended for SAS 9 providing the best overall SAS experience. Based on this, the following Azure NetApp Files relevant performance guidance is provided at a high level:

SAS/ANF sizing guidance:

If using a RHEL7 operating system, the E64-16ds_v4 is the best choice based upon the 100MiB/s per physical core target for SASDATA.
- E64-16ds_v4 – 90 –100MiB/s per core
- E64-32ds_v4 – 45-50MiB/s per core
If using RHEL8.2, using either the E64-16ds_v4 or E64-32ds_v4 are viable though the former is preferrable given the 100MiB/s per core target for SASDATA
- E64-16ds_v4 – 150-160 MiB/s per core
- E64-32ds_v4 – 75-80 MiB/s per core
If using RHEL8.3, both the E64-16ds_v4 and the E64-32ds_v4 are likely fully acceptable given the per core throughput target (pending SAS internal validation performance runs to fully vet this number)
- Early validation indicates approximately 3000MiB/s of Reads
- Results will be posted once validation is complete

The above is based on below:

Red Hat Enterprise Linux, (RHEL) is the distribution of choice for SAS customers when running SAS 9 on Linux. Each of the kernels supported by Red Hat have their own unique bandwidth constraints in and of themselves when using NFS.

Testing has shown that a single RHEL 7 instance is expected to achieve no more than roughly 750-800MiB/s of read throughput against a single storage endpoint (i.e. against a network socket) while 1500MiB/s of writes are achievable against the same, using a 64KiB rsize and wsize mount options. There is evidence the aforementioned read throughput ceiling is an artifact of the 3.10 kernel. Refer to RHEL CVE-2019-11477 for detail.

Testing has shown that a single RHEL 8.2 instance with its 4.18 kernel is free of the limitations found in the 3.10 kernel above, as such 1200-1300MiB/s of read traffic using a 64KiB rsize and wsize mount option is achievable. With that said, expect the same 1500MiB/s of achievable throughput as seen in RHEL7 for large sequential writes.

A single RHEL 8.3 instance has not yet been tested by SAS for SAS 9.4 workloads, with that said it is understood that with the nconnect mount option new to the RHEL8.3 distribution somewhere around 3,000MiB/s reads throughput are achievable from a single Azure NetApp Files volume is likely. Expect no more than 1,500MiB/s of writes to a single volume even with nconnect.

File System Mount Options

SAS recommends the following NFS mount commands for NFS shared file systems being used for permanent SAS DATA files:

bg,rw,hard,rsize=65536,wsize=65536,vers=3,nolock,noatime,nodiratime,rdirplus,tcp

Capacity Recommendations

Please see the following Azure NetApp Files performance calculator for guidance when sizing SASDATA volumes. As volume bandwidth is based upon volume capacity, and as capacity cost is based upon which service level is selected, and as service level selection is based upon capacity versus bandwidth needs, determining which service level can be somewhat complicated on your own. Using this calculator, enter data as follows:

Volume Size: <Desired Capacity>
I/O Size: 64KiB Sequential
Read Percentage: 80%
Throughput: <Desired Throughput considering 100MiB/s per core>
IOPS: 0

The readout at the bottom of the screen advises capacity requirements at each service level and the cost per month thereof.

Throughput: This is the bandwidth of the volume based on the workload mixture. For an 80% 64KiB sequential read workload, 3096MiB/s is the anticipated maximum.
IOPS: This is the number of IOPS the volume will deliver at the above throughput target.
Capacity Pool Size: A Volumes capacity is carved from a capacity pool. Capacity pools are sized in 1TiB increments.
Volume Size: This is the amount of capacity needed by the volume at the given service levels to achieve the required throughput. Volume capacity (reported in GiBs) may be equal to or less than capacity pool size.
Capacity Pool Cost (USD/Month): This is the cost per month of the capacity pool at the given size.
Volume Show Back (USD/Month): This is the cost per month of the capacity for the volume at the specified capacity. Charges are based on capacity pool sizes; the volume show back shows that part thereof of that cost.

Note: that the user experience will be the same regardless of which service level is selected.

Further control costs using the concept of volume shaping with Azure NetApp Files. Two dynamic options are available to customers to influence performance and cost.

Other Tuning Guidance

Mellanox Driver Tuning:

Mellanox drivers are used for accelerated networking on Azure Virtual machines. Mellanox recommends pinning network interfaces to the lowest numerical NUMA node to get the best possible user experience. In general, NIC’s should be pinned to NUMA node 0, in Azure, considering hypervisor logic, that may not be the ideal configuration. The Eds_v4 SKU has been found in general only nominally susceptible to this issue, with that said benefits may be found pining the accelerated networking interface the most appropriate NUMA node.

In addition to remapping the accelerated networking interface, additional benefit may be found in setting the number of tx/rx queues for the accelerated networking interface to no more than the number of logical cores associated with a single NUMA node. By setting the tx/rx queue count equal to or less than the core count per NUMA node, you avoid queue selection wherein the queue is not resident to the most appropriate NUMA node. On some systems, the maximum tx/rx queue count is less than the core count for a single NUMA node – only on these systems should the tx/rx queue count be set less than the logical core count per NUMA node.

Please download the following code written by Azure engineering to correctly identify and set the accelerated networking interface and tx/rx queue count on each worker node. This script should be executed on VM start up. All that is required is that at least one NFS mount is already in place.

The script is run as such: ./set_best_affinity.ksh </nfsmount/file>

RPC Slot Table Tuning

RPC Slot Table refers to the maximum allowed threads on a single TCP connection that is allowed by the NFS client and server. These values are controlled through sunrpc configuration on NFS clients. The latest NFS client versions default to a dynamic slot table value of 65536, which means that the client attempts to use as many slot tables as it can in a single TCP connection. Azure NetApp Files, however, supports 128 slot tables per TCP connection. If a client exceeds that value, Azure NetApp Files enacts NAS flow control and pauses client operations until resources are freed up. As a best practice, set the slot table values on NFS clients to a static value no higher than 128. To ensure the best possible SAS storage performance with the NFSv3 protocol, please add the following tunables to /etc/sysctl.conf and then update sysctl by running sysctl -a

sunrpc.tcp_max_slot_table_entries=128

sunrpc.tcp_slot_table_entries=128

Run the above before mounting NFS volumes.

NOTE: The slot table values used above may not be optimal for the RHEL8.3 clients when using the nconnect NFS mount option. More information on this tuning parameter for RHEL8.3 will come later.

Machine Placement:

Use Proximity Placement Groups to co-locate your Azure instances within the same data center and under a common router to:

Reduce intra-SAS node network latency
Provide a similar network latency between each SAS node and Azure NetApp Files

NOTE: At this time, Proximity Placement Groups have no bearing on the relative location of Azure Instances compared to Azure NetApp Files storage, updates will be made should this change.

MargaretC · Posted 12-10-2020 02:37 PM

Just updated the blog with information from recent testing.

View solution in original post

JuanS_OCS · Posted 08-07-2019 03:21 AM

Thank you @MargaretC for sharing.

In the FAQ, I could read:

Azure NetApp Files can support any POSIX-compliant workload that requires shared file storage. The service is offered in three performance tiers to fit your workloads: Standard for static web content, file shares and database backups; Premium, comparable to mainstream SSD performance and suitable for databases, enterprise apps, analytics, technical applications and messaging queues; and Ultra for most performance-intensive applications such as those in high-performance computing

Therefore, my question would be: what is the version or versions that have been tested by SAS? What was the issue/bottleneck for the poor Write performance ?

MargaretC · Posted 08-07-2019 10:12 AM

@JuanS_OCS I am chasing the first question with the guys from Microsoft and NetApp that setup the storage for our test.

As for the second question, the issue with NFS has to do with how NFS was designed - to write once read often. When you start doing lots of writes to an NFS file system, the NFS metadata cannot keep up with the writes. And it takes seconds (not nanoseconds) for the file you just wrote to become available in the file system. With SAS, that means one step creates a file, and the next step that wants to use the file is told by the file system the file is not there. Does this help?

Margaret

MargaretC · Posted 08-07-2019 10:49 AM

@JuanS_OCS This is what I got back from Microsoft and NetApp:

"SAS conducted performance testing against the Ultra Tier. We did create pools for standard and premium tier during our testing for potential lower performance use cases and/or data at rest that is not being processed, but did not test the performance. We just tested the Ultra Tier. The Ultra Tier achieves the highest throughput possible.

Note, these Tiers use all the same storage media under the hood (flash). The performance is controlled by throttling the tiers by IO/throughput density. This ensures a customer selected capacity has a reserved amount of throughput and prevents resource contention.

Another thing to note, throughput scales linearly as you increase the size of the volume holding constant the latency. As such, you increase the capacity during production jobs during the day (or whatever timeframe SAS jobs need to be run) to accommodate greater throughput, you decrease the capacity during non-production down to the data utilization and shed costs. This dynamic scale up and scale down capability to control performance and costs takes seconds and is non-disruptive to the SAS Grid nodes. "

Hope this helps.

Margaret

MargaretC · Posted 11-25-2019 09:23 AM

After working with MS Azure, NetApp, and a SAS customer, we would like to strongly recommend the use if the following NFS mount options:

bg,rw,hard,rsize=65536,wsize=65536,vers=3,noatime,nodiratime,rdirplus,acdirmin=0,tcp,_netdev

JuanS_OCS · Posted 11-25-2019 09:26 AM

Thank you very much for following up this topic, @MargaretC

MargaretC · Posted 12-10-2020 02:37 PM

Just updated the blog with information from recent testing.

MargaretC · Posted 02-26-2021 04:42 PM

Additional testing of SAS using ANF with RHEL 8.3, NCONNECT= and NFS readahead increased has provided enhanced performance. Here are the changes to the testing we did in 2020.

–RHEL 8.3

–50 TB file system

–Ultra service level

–Nconnect Mount Option for NFS volume: 8

–Set NFS readahead to 15 MB

–Kernel Tunables (via /etc/sysctl.conf)

sunrpc.tcp_max_slot_table_entries=512

Kernel Tunables (via custom tuned profile)

include = throughput-performance
vm.dirty_bytes = 31457280
vm.dirty_expire_centisecs = 100
vm.dirty_writeback_centisecs = 300
All other tunables defined in RHEL Tuning Guide

–Accelerated network tuning (https://github.com/ANFTechTeam/accelnet_tuning)

./set best_affinity.ksh <NFS file system>

Mixed Analytics Workload

Putting SASDATA and SAS WORK on ANF

The 1-node test the ratio is 0.82

The 2-node test the average ratios is 0.67

The 3-node test the average ratios is 0.57

rhel_iotest results:

The 1-node test gave us 98 MB per second per physical core for WRITEs and 200MB per second per physical core for READs.

The 2-node test gave us 53 MB per second per physical core for WRITEs and 136 MB per second per physical core for READs.

The 3-node test gave us 37 MB per second per physical core for WRITEs and 86 MB per second per physical core for READs

With the above changes in addition to the ones in the original post, you should be able to achieve the 100MiB/s throughput per physical core SAS recommendation for SAS GRID clusters using an ANF volume for SASDATA (persistent SAS data files) are scalable to 48 physical cores across two or more MS Azure machine instances.

gvanteylingen · Posted 03-03-2021 10:53 AM

Here's some detailed instructions on how to configure NFS readahead (on RHEL 8.3) as this might not be obvious.

1) Create /etc/udev/rules.d/99-nfs.rules to include:

SUBSYSTEM=="bdi", ACTION=="add", PROGRAM="/bin/awk -v bdi=$kernel 'BEGIN{ret=1} {if ($4 == bdi) {ret=0}} END{exit ret}' /proc/fs/nfsfs/volumes", ATTR{read_ahead_kb}="15380"

2) Once the file is in place, issue the udevadm reload command to effectuate the rule:

# udevadm control --reload

3) Note: the change is not dynamic, only volumes mounted after the udev rule is applied will get the new readahead value applied. In other words, unmount and remount the NFS volumes (or reboot the system) for the changes to take effect.

MargaretC · Posted 03-08-2022 07:15 PM

Please note that the Kernel Tunables parameter below needs to be 128, not 512 like is mentioned above.

–Kernel Tunables (via /etc/sysctl.conf)

sunrpc.tcp_max_slot_table_entries=128

Setting this value greater than 128 will not increase performance of the system. ONTAP limits each connection to 128 maximum outstanding operations, the tcp_max_slot_table_entries tunable defines how many slots are available per connection. By limiting this setting to 128 matching the concurrency supported by ONTAP, we actually stand to improve performance. If you are curious, please read this study published in June 2021.

https://docs.microsoft.com/en-us/azure/azure-netapp-files/performance-linux-concurrency-session-slot...

thomasstahl · Posted 07-07-2023 09:41 AM

Hi Margaret,

are these tuning recommendations and settings transferable to an on-prem netapp?

Is SASWORK on NFS now a supported and recommended option? In this post of yours from end 2019 that did not seem to be the case: "As a reminder, SAS WORK and SAS UTILLOC should never be placed on NFS storage."

Last, can you elaborate on why SAS does not recommend NFSv4.1 currently for SAS Grid deployments?

Regards

Thomas

souhirdaadouch · Posted 06-12-2023 12:41 PM

Hello community, I would like to ask you about the proper procedure for deleting the NetApp files configuration after testing it on SAS Viya. I am looking for a safe method to delete the NetApp account and volume on Azure without causing any errors in SAS Viya.

MargaretC · Posted 06-13-2023 10:10 AM

@souhirdaadouch This is a question for MS Azure, not SAS.

souhirdaadouch · Posted 06-13-2023 01:55 PM

Hello @MargaretC, thank you for your response. I wanted to know if there will be any risk to the SAS Viya configuration or if it will only lose connection to the NetApp file volume without any risk.

MargaretC · Posted 06-13-2023 02:02 PM

@souhirdaadouch There will only be a risk to Viya if you have installed any configuration files on the NetApp.

Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Azure NetApp Files Per Volume Expectations:

Which Version of RHEL should be used?

SAS/ANF sizing guidance:

The above is based on below:

File System Mount Options

Capacity Recommendations

Other Tuning Guidance

Mellanox Driver Tuning:

RPC Slot Table Tuning

Machine Placement:

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A new shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A new shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A new shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A new shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A new shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure

Re: Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure