BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.

Last Updated: 12APR2022

Information added on 12APR2022:  Information about new Ebds_v5 instances.

 

This post discusses specifics for running SAS® (either SAS 9.4 or Viya 3.x) in the Microsoft (MS) Azure public cloud.  Please review the SAS® Global Forum 2020 paper “Important Performance Considerations When Moving SAS® to a Public Cloud” for critical information that we will not cover in this post.

 

To maximize the guidelines in this post, you need to understand the compute resources (cores, memory, IO throughput and network bandwidth) needs of your SAS applications.  If you know this information, then you can override the generic IO throughput recommendations discussed in this post.

 

Please remember that most public cloud instances list CPUs as virtual CPU(s). These CPUs might be hyperthreaded (two threads per physical core). You need to understand if the vCPU includes hyperthreads so that you can ensure you have the correct number of physical cores for SAS.  To convert Intel vCPUs to physical cores, divide the number of hyperthreaded vCPUs by 2.

 

In addition to the information about Azure instances types, storage and networking, please follow the best practices in the “Optimizing SAS on RHEL (April 2019, V 1.3.1 or later)” tuning guide.  The information in the “2.4.4.4 Virtual Memory Dirty Page Tuning for SAS 9” section on page 17 is essential.

 

Azure instance types.  This link brings you to the list of instance types.  Read the description carefully to thoroughly understand what compute resources are available with each instance. 

 

If the instance type contains multiple processor models – such as the Esv3 series which can be either Broadwell, Skylake or Cascade Lake processors – you need to confirm that each instance is using the same Intel processor since you are unable to select the chip set that will be used for the VM from the Portal.  After an instance is instantiated, using the lscpu command will list the CPU Model Name for the system. 

 

For SAS Grid compute nodes and CAS Controller/Workers, we recommend these systems all be the same CPU generation. This will ensure you get consistent performance overall rather than from the slowest and oldest CPU instance.  Please work with your Microsoft account team to determine how to make this happen.  Also, we strongly suggest investing in Unified (a.k.a. Premier) Support when deploying SAS in Azure.

 

General Tuning Guidance

  • Review “Max uncached disk throughput IOPS/MBps” to see what the maximum MB per second IO throughput is available between the instance you are looking at and Premium Storage.  For a Standard_E32s_v4 instance (one of the most popular MS Azure instances that is being used for SAS compute systems), the maximum IO throughput (instance total, not per physical core) is 768 MB per second.  For a 16 physical core system, this means 48 MB/sec/physical core IO bandwidth for all the data that will be stored on external Premium Storage.  If you need more IO throughput per physical core to the external Premium Storage, you can constrain the number of cores in the instance.  There will be more details on “constraining cores” later in this post.  UPDATE: With the new Ebsd_v5 instances, the maximum IO throughput has been increased significantly.  Update added 12PAR2022. 

 

  • Review “Max NICs/Expected network bandwidth (Mbps)” to see what the maximum network egress bandwidth is.  For a Standard_E32s_v4 instance, the maximum network egress bandwidth is constrained to 16 Gigabit/s, whereas ingress is constrained by network card speed and number of network connections alone. Refer to this page for detail, where the first 4 paragraphs are a must read. Please note, SAS recommends a network bandwidth of at least 10 Gigabit between SAS systems that within a SAS infrastructure. 
  • Review “Temp storage (SSD) GB” and “Max cached and temp storage throughput: IOPS/MBps (cache size in GB)” to see the size and maximum IO throughput of the local, ephemeral disk.  For a Standard_E32s_v4 instance, the maximum size of the internal SSD that could be used for temporary SAS file systems (SAS WORK/UTILLOC or CAS_DISK_CACHE) is 512 GB and the maximum IO throughput is 512 MB/sec (32 MB/sec/physical core).  This Temp storage size is both small and operates at a much lower IO throughput than is recommended by SAS - so you will probably not want to use it for temporary SAS file systems.  When the local ephemeral storage is inadequate, more IO is required from the external Premium Storage that also has a cap on its IO throughput – see number 2) above.  UPDATE: With the new Ebsd_v5 instances, the maximum IO throughput has been increased significantly.  Update added 12PAR2022. 
  • Please note: You can Utilize Constrained Cores with Azure instances to reduce the number of vCPU’s (and thus physical cores) presented to the instance’s operating system. This would turn a Standard_E32s_v4 from a 16 physical cores system to an 8 physical cores system, effectively doubling the IO bandwidth per core. This increases the IO throughput per physical core closer to minimum recommended for SAS workloads. Details on this feature, and a list of instances that can be constrained, can be found here.   
  • To avoid sporadic NMI lockups that might hold processing while a thread waits for an available vCPU when using RHEL 7.x (3.10 kernel) with SAS compute nodes. There is a known issue in the iSCSI and SCSI drivers in this kernel which can cause CPU lock ups when under heavy IO load. Without going into too much technical detail, it basically boils down to Linux having a ringbuffer with IO completes that it wants too occasionally flush out. In some cases, the flushing can take very long due to system defaults which can cause your CPUs to lock up. This in turn may result in timeouts of SAS servers, which can cause job failures.

 

There are two workarounds to resolve the issue. Add either of the following options to Grub and reboot the machine.

 

  • Decrease the ringbuffer size and increase the vCPUs per channel. Preferred solution
    • hv_storvsc.storvsc_ringbuffer_size=131072 hv_storvsc.storvsc_vcpus_per_sub_channel=1024
  • Disable blk-mq
    • scsi_mod.use_blk_mq=n

 

Network 

  • To achieve optimal network bandwidth, Azure Accelerated Networking must be enabled.  Accelerated Networking is available on any Azure VM with 2 or more physical cores. 

 

To validate that Accelerated Network is enabled on a linux instance, please run the following commands and ensure your output looks like the output on this web site

 

  1. lspci 
  2. ethtool -S eth0 | grep vf_
  3.  uname -r
  • In addition to Accelerated Networking, SAS needs to be on an isolated cloud VNET, Resource Group, etc. This VNET should “share nothing” with other customer infrastructures.  The exception is placing the instances for your shared file system and RDBMSs dedicated to SAS on this VNET as well. 
  • DNS resolution must be verified prior to SAS installation.  The FQDNs used to communicate between Nodes within Azure should resolve to Azure internal IP addresses.  From nodes external to Azure (client desktops running SAS Enterprise Guide or the SAS Plug-in for Microsoft Office) FQDNs must resolve to the public / external IP addresses of nodes hosting the SAS server tier running inside Azure.
  • When attaching SAS clients that maintain persistent connections (for example SAS Enterprise Guide) to Azure instances from outside of Azure, we have seen the connections being dropped if they are idle for more than 4 minutes. This is a feature of Azure NSG.  To avoid this from happening, you will need to go into the SAS Management Console and add the KeepAlive setting to Workspace Server and set it to a value less than 4 minutes.
  • SAS nodes must be able to communicate directly with each other without contention. SAS Compute nodes need both extremes of high throughput and low-latency communications.  Throughput is needed for loading data into memory. Low-latency is needed to coordinate and perform complex analytics between nodes and to provide data resilience via copy redundancy. Please start your SAS deployment with an isolated VNET, using private IP addresses and private DNS.  (At minimum, SAS nodes should be in their own subnet.) If you need to deploy a SAS solution in Azure and you do not have cross-premises connectivity (e.g. ExpressRoute, VPN), then use one of following approaches to enhance your security:
  1. Use Azure Bastion Host (Preferred) - https://docs.microsoft.com/en-us/azure/bastion/bastion-overview
  2. Create a “jump box” that is the public entry point (with a Private IP address) 
  • Azure Default VM Network MTU Size - Azure strongly recommends the default network MTU size of 1500 not be adjusted because Azure’s Virtual Network Stack will attempt to fragment a packet at 1400 bytes. To learn more, please review this “Azure and MTU” article.

 

External Storage 

To achieve the most IO throughput for SAS, please make sure that you follow the best practices in the “Optimizing SAS on RHEL (April 2019, V 1.3.1 or later)” tuning guide.  The information in the “2.4.4.4 Virtual Memory Dirty Page Tuning for SAS 9” section on page 17 is essential.

 

The following architecture recommendations cover scale-up scenarios. Scale-out recommendations will follow later, pending validation.

 

  • Premium Storage: Like the instance types, there is a maximum IO throughput per Premium Disk.  These values can be found on the “Throughput per disk” row of this table.  Multiple Premium Disks can be attached to an instance enough to meet or exceed the “Max uncached disk throughput IOPS/MBps” of an instance, should be attached the instance. These disks should be striped together using the operating system to create a single file system that can utilize the full throughput across all the disks.

 

When creating disk storage, you will be prompted for setting a Storage Caching value.  Please set the following based on the type of files that will be used by these disks:

  • ReadWrite for your operating system storage
  • None* for your persistent SAS data files
  • None* for your SAS temporary files

* this value was changed on 07DEC2021 after additional testing .

 

With RHEL 7.x distribution and 3.x kernel testing has shown that leaving the virtual-guest tuned profile (vm.dirty_ratio = 30 and vm.dirty_background_ratio = 10) achieves the best IO throughput when using Premium Storage.

 

  • Azure Disk Storage is the only shared block storage in the cloud that supports both Windows and Linux based clustered or distributed applications to run your most demanding enterprise applications – like clustered databases, parallel file systems, stateful containers, and machine learning applications – in the cloud, without compromising on well-known deployment patterns for fast failover and high availability.   While this storage will technically function with all SAS applications, we do not feel you will be able to achieve the IO throughput for MOST SAS applications.

 

As a reminder, SAS temporary files and directories such as SAS WORK, SAS UTILLOC and CAS_DISK_CACHE should be placed on storage with the highest proven throughput possible.  Today that usually means Premium Storage or the instance’s local SSD.

 

Reference Instances for SAS Compute Nodes

To summarize the above, the following are good example configuration for SAS 9.4 or SAS Viya 3.5 compute nodes.

 

Standard_E16bds_v5 or E32bds_v5 specs for this system:  recommended instances - but may not be available everywhere since these are newly released.

  • Ice Lake processor.
  • 8 or 16 physical cores (16 or 32vCPUs)
  • 128/256 GB RAM
  • For persistent storage, use six P30 Premium Disks striped together for a total of 6 TBs. If more disk space is needed, then add more P30 disks or larger Premium Disks..
  • The internal SSD drive can be used for SAS temporary file systems, but it cannot be increased in size.
  • 30 Gigabit egress network connectivity

 

Standard_E64-32ds_v4 or E64_16ds_v4 specs for this system:  recommended instances

  • Cascade Lake processor.
  • 8 or 16 physical cores (16 or 32vCPUs)
  • 504 GB RAM
  • For persistent storage, use six P30 Premium Disks striped together for a total of 6 TBs. If more disk space is needed, then add more P30 disks or larger Premium Disks. Remember the maximum IO bandwidth to the E64 instance is 1,200 MB /sec. With the constrained cores, this equates to 75 MB/sec/physical core for the Standard_E64-32ds_v4 and 150 MB/sec/physical core for the Standard_E64-16ds_v4 .  SAS recommends at least 100 MB/sec/physical core.
  • The internal 2,400 GB SSD drive can be used for SAS temporary file systems, but it cannot be increased in size. The throughput for this storage is 1,936 MB/sec which equates to 121 MB/sec/physical core.  SAS recommends at least 150 MB/sec/physical core.
  • 30 Gigabit egress network connectivity

 

Standard_E32s_v4 - specs for this system:

  • Broadwell, Skylake or Cascade Lake processor.  The inability to determine which chipset you will get with this instance type makes this not a good choice for SAS Grid implementations.
  • 16 physical cores (32vCPUs)
  • 256 GB RAM
  • For persistent storage, use four P30 Premium Disks striped together for a total of 4 TBs. If more disk space is needed, then add more P30 disks or larger Premium Disks. Remember the maximum IO bandwidth to the E32 instance is 768 MB/sec. This equates to 48 MB/sec/physical core. SAS recommends at least 100 MB/sec/physical core.
  • The internal 512 GB SSD drive can be used for SAS temporary file systems, but it cannot be increased in size. The throughput for this storage is 512 MB /sec which equates to 32 MB /sec/physical core.  SAS recommends at least 150 MB /sec/physical core.
  • 16 Gigabit egress network connectivity

 

Standard_L32s_v2 - specs for this system:

  • AMD 7551 processor.
  • If you are planning on using SAS 9.4m6 and earlier versions of SAS 9.4, on these instances, you will need to set the Linux environment variable (MKL_DEBUG_CPU_TYPE) to the value of 5.  Here is the command to do this:    export MKL_DEBUG_CPU_TYPE=5
  • 16 physical cores (32 vCPUs-constrained).
  • 256 GB RAM
  • For persistent storage, use four P30 Premium Disks striped together for a total of 4 TBs. If more disk space is needed, then add more P30 disks or larger Premium Disks. Remember the maximum IO bandwidth to the L32 instance is 640 MB/sec. This equates to 40 MB/sec/physical core. SAS recommends at least 100 MB/sec/physical core.
  • This system has four internal 1.92 TB NVMe drives which can be used for temporary file systems. These disk can be OS striped to create a 7.5 TB file system.  The maximum IO throughput to these drives is 8000 MB/sec! That equates to 500 MB/sec/physical core. SAS recommends at least 150 MB/sec/physical core.
  • 16 Gigabit egress network connectivity

 

Conclusion

There are many resources, configuration settings and constraints to check within Azure to configure an instance to meet the needs of your SAS application.  It is highly likely you may have to provision an instance with more physical cores (with or without a constrained core count) in order to get the commensurate IO throughput required by your application. Likewise, you may also have to over-provision storage capacity to achieve the IO throughputs needed for your SAS application.

 

It is possible you may have to use an instance type with more cores than needed (with or without a constrained core count) in order to get the commensurate IO throughput required by your application.  And that you may have to setup more storage capacity in order to get the IO throughput than you need.

 

As always, there are cost versus performance choices. These selections need to be based on your SLAs and business needs for SAS applications running in Azure versus where they are currently running.

 

Acknowledgements

Many thanks to SAS R&D, SAS Technical Support, Microsoft Azure, Azure NetApp Files, Sycomp, Veritas, and DDN experts for reviewing this post.

 

  • Margaret Crevar, SAS
  • Jim Kuell, SAS
  • Chris Marin, Microsoft
  • Jarrett Long, Microsoft
  • Gert van Teylingen, Azure NetApp Files
  • Chad Morgenstern, Azure NetApp Files
  • Dan Armistead, Azure NetApp Files
  • Greg Marino, Azure NetApp Files
  • James Cooper, DDN
  • John Zawistowski, Sycomp
  • Joseph D’Angelo, Veritas
1 ACCEPTED SOLUTION

Accepted Solutions
MargaretC
SAS Employee

UPDATE to the Azure NetApp Files (ANF) section:

 

  • Please use the following NFS mount options and tunables for Azure NetApp Files NFS mounted volumes used for SAS DATA:  

 

  • vers=3,nolock,noatime,nodiratime,rdirplus,rsize=65536,wsize=65536

 

  • RPC Slot Table tunables
    RPC Slot Table refers to the maximum allowed threads on a single TCP connection that is allowed by the NFS client and server. These values are controlled through sunrpc configuration on NFS clients. The latest NFS client versions default to a dynamic slot table value of 65536, which means that the client attempts to use as many slot tables as it can in a single TCP connection. Azure NetApp Files, however, supports 128 slot tables per TCP connection. If a client exceeds that value, Azure NetApp Files enacts NAS flow control and pauses client operations until resources are freed up. As a best practice, set the slot table values on NFS clients to a static value no higher than 128.  To ensure the best possible SAS storage performance with the NFSv3 protocol, please add the following tunables to /etc/sysctl.conf and then update sysctl by running sysctl -a.
    • sunrpc.tcp_max_slot_table_entries=128
    • sunrpc.tcp_slot_table_entries=128

 

View solution in original post

13 REPLIES 13
MargaretC
SAS Employee

UPDATE to the Azure NetApp Files (ANF) section:

 

  • Please use the following NFS mount options and tunables for Azure NetApp Files NFS mounted volumes used for SAS DATA:  

 

  • vers=3,nolock,noatime,nodiratime,rdirplus,rsize=65536,wsize=65536

 

  • RPC Slot Table tunables
    RPC Slot Table refers to the maximum allowed threads on a single TCP connection that is allowed by the NFS client and server. These values are controlled through sunrpc configuration on NFS clients. The latest NFS client versions default to a dynamic slot table value of 65536, which means that the client attempts to use as many slot tables as it can in a single TCP connection. Azure NetApp Files, however, supports 128 slot tables per TCP connection. If a client exceeds that value, Azure NetApp Files enacts NAS flow control and pauses client operations until resources are freed up. As a best practice, set the slot table values on NFS clients to a static value no higher than 128.  To ensure the best possible SAS storage performance with the NFSv3 protocol, please add the following tunables to /etc/sysctl.conf and then update sysctl by running sysctl -a.
    • sunrpc.tcp_max_slot_table_entries=128
    • sunrpc.tcp_slot_table_entries=128

 

gvanteylingen
Calcite | Level 5

Please note as the blog states:

The(se following) architecture recommendations cover scale-up scenarios. Scale-out recommendations will follow later, pending validation.

 

MargaretC
SAS Employee

I have updated this post with our recommendation to use the E64-32ds_v4 or E64-16ds_v4 instances for SAS to get optimal IO throughput with SAS 9.4, especially SAS Grid.

 

I also updated the External storage section.

 

Will keep updating the post as we well more about MS Azure and SAS.

MargaretC
SAS Employee

Added information today about the sporadic NMI lockups that might hold processing while a thread waits for an available vCPU when using RHEL 7.x (3.10 kernel) with SAS compute nodes.  Please review that new section to see how to overcome the issue.

 

Please note that the issue does not occur in RHEL 8.

 

Speaking of RHEL 8, please remember that SAS Viya 3.x is not supported on RHEL 8 at the current time.

 

jimbarbour
Meteorite | Level 14

@MargaretC,

 

Thank you for your post. 

 

I have a rather basic question, so I'm not offended in the least if you redirect me elsewhere.  Here's my situation and question:

Our company is currently running SAS 9.4 on a Windows 2016 (Standard) server.  Is there a way for us to access data in Azure (format to be determined but more than likely HDINSIGHT) from our current environment or must we use a cloud-based SAS instance of SAS 9.4?  I'm assuming here that a cloud-based SAS 9.4 instance in Azure is supported and that SAS Viya would not be required in order to have a cloud-based instance.  If I'm wrong on any point, please correct me.

 

Thank you,

 

Jim

MargaretC
SAS Employee

This post Access Microsoft Azure Storage & Big Data - SAS Support Communities seems to indicate that SAS/ACCESS to Hadoop can do what you are asking for.   You could post a question as a reply to the above post to clarify.

 

MargaretC
SAS Employee

We updated this section of the paper today:

 

When creating disk storage, you will be prompted for setting a Storage Caching value.  Please set the following based on the type of files that will be used by these disks:

  • ReadWrite for your operating system storage
  • None* for your persistent SAS data files
  • None* for your SAS temporary files

* this value was changed on 07DEC2021 after additional testing.

jdickson
Calcite | Level 5

Are these values still relevant now? I am finding my Read I/O in Azure with my Striped Disks is very slow compared to my Write I/O on my Work and Data drives. Enabling the Read caching helps but I am apprehensive to leave it that way it the recommendation is not to use it.

MargaretC
SAS Employee

Can you share with me what Azure instance you are using and what you mean by "Striped Disks"?   What disks are being used for DATAand what are being used for WORK?  

jdickson
Calcite | Level 5

I just posted it to the Admin and Deployment thread.

 

Windows Azure server slow read I/O - SAS Support Communities

MargaretC
SAS Employee

I have added information to the paper regarding using Veritas InfoScale with SAS.  Details can be found here:  InfoScale by Veritas: A shared file system to use ... - SAS Support Communities 

 

anfradave
Calcite | Level 5

Hello,

 

Has anyone deployed SAS Office Analytics 9.4 using Azure Virtual Desktop?  Looking for specific feedback guidance on this Azure deployment option.  Thank you.  

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 32795 views
  • 20 likes
  • 5 in conversation