BookmarkSubscribeRSS Feed
MargaretC
SAS Employee

A more updated version of this paper can be found here:  https://communities.sas.com/t5/Administration-and-Deployment/Best-Practices-for-Using-Microsoft-Azur...

 

 

If you are considering moving your SAS applications to MS Azure, please review the information below to make sure you are choosing the correct MS Azure instances and have prepared them as optimally as possible.

 

As always, a detailed understanding of how SAS is being used, where the source data resides, given time constraints on running reports/jobs, etc., will help with decisions you make in choosing MS Azure instances and storage.

 

Let’s start off with the MS Azure instance types.  This link brings you to the list of instance types.

We have compiled the following information to help guide your decisions.  The memory optimized instances, Ev3 and Esv3 series in particular, tend to be the best for SAS.  Let’s walk through important information found on the MS Azure Instance types link above. 

 

  • The Ev3 and Esv3 series VMs can come configured with either Broadwell or Skylake processors. From the Portal, one is unable to select which chip set that will be used for the VM.  After an instance is instantiated, using the lscpu command will list the CPU Model Name for the system.  For your SAS Grid compute nodes and CAS Controller/Workers, we recommend that the systems all be the same CPU model. Please work with MS Azure to determine how to make this happen.

 

  • Review the information in the “Max uncached disk throughput IOPS/MBps” to see what the maximum MB per second IO throughput is available between the instance you are looking at and Premium Storage. For a Standard_E32s_v3 instance (one of the most popular MS Azure instances that is being used for SAS compute systems), the maximum IO throughput (instance total, not per physical core) is 768 MB per second.  For a 16 physical core system, this means 48 MB/sec/physical core IO bandwidth for all the data that will be stored on external Premium Storage.

 

  • Review the information in the “Max NICs/Expected network bandwidth (mbps)” to see what the maximum network bandwidth is. For a Standard_E32s_v3 instance, the maximum network bandwidth is 16 Gigabit. Please note SAS recommends a network bandwidth of at least 10 Gigabit between the SAS systems that make up your SAS infrastructure. 

 

  • Review the “Temp storage (SSD) Gib” and “Max cached and temp storage throughput: IOPS/MBps (cache size in GiB)” to see the size and maximum IO throughput of the local, ephemeral disk. For a Standard_E32s_v3 instance, the maximum size of the internal SSD that could be used for temporary SAS file systems (SAS WORK/UTILLOC or CAS_DISK_CACHE) is 512 GB and the maximum IO throughput is 512 MB/sec (32 MB/sec/physical core).  This is not a large amount of capacity space and is at a lower IO bandwidth than is recommended by SAS, so you will probably not want to use it for temporary SAS file systems.  This relegates more IO pressure to the external Premium Storage that also has a cap on its IO throughput – see number 2) above. Note that local, ephemeral storage must be used as separate disks and cannot be striped together.

 

  • Please note you can utilize MS Azure’s Utilize Constrained Cores to reduce the number of vCPU’s presented to the OS of an instance. This would turn the above Standard_E32s_v3 from a 16 physical cores system to an 8 physical cores system, effectively doubling the IO bandwidth per core that is listed above. This will, in turn, bring the IO throughput per core closer to minimum recommended for SAS workloads. Details on this feature can be found here.

 

 

Let’s talk about setting up the network

 

  • To achieve optimal network bandwidth, Accelerated Networking must be enabled. Accelerated Networking requires RHEL 7.4 or higher. 

 

To validate that you have Accelerated Network enabled, please run the following commands and ensure your output looks like the output on this web site

 

  1. lspci
  2. ethtool -S eth0 | grep vf_
  • uname -r

 

  • In addition to Accelerated Networking, SAS needs to be on an isolated cloud VNET, Resource Group, etc. It should “share nothing” with other customer infrastructures. The exception to this rule is you would place the instances for your shared file system and RDBMSs dedicated to SAS on this VNET as well. 

 

Deploy on single VNET and Subnet resources specifically created for this deployment. Do not utilize any inspection, tracing features etc., on the VNET.

 

 

And finally, let’s discuss configuring the external Premium Storage.  Like the instance types, there is a maximum IO throughput per Premium Disk.  These values can be found on the “Throughput per disk” row of this table.  Multiple Premium Disks, enough to meet or exceed the “Max uncached disk throughput IOPS/MBps” of an instance, should be attached the instance. These disks should be striped together to create a single file system that can then simultaneously utilize the full throughput across all the disks.

 

When setting up your storage, you will be prompted for setting a Storage Caching value.  Please set ReadWrite for your operating system storage and ReadOnly for your SAS data disks. 

 

To summarize the above, the following would be a good example configuration for a SAS compute node (for either SAS 9.4 or SAS Viya).

 

Standard_E32s_v3 with four P30 Premium Disks for a total of 4 TBs of persistent disk space.   If more disk space is needed, then look at larger Premium Disks. Remember the maximum IO bandwidth to the E32 instance is 768 MB/sec. The internal 512 GB drive can be used for temporary file systems, but it cannot be made any larger.

 

Standard_L32v_v2 with three P30 Premium Disks for a total of 3 TBs of persistent disk space.   If more disk space is needed, then look at larger Premium Disks Remember the maximum IO bandwidth to the L32 is 640 MB/sec. This instance has four 1.92 TB NVMe drives that can be used for temporary file systems. 

 

In conclusion, there are many resource and configuration settings to check within MS Azure, in order to configure an instance to meet the needs of your SAS application.  It is possible you may have to use an instance type with more cores than needed (with or without a constrained core count) in order to get the commensurate IO throughput required by your application.

 

5 REPLIES 5
MargaretC
SAS Employee

UPDATE:  

 

1) You are able to stripe together ephemeral storage.  Sorry for the misinformation above

2) On the Lsv2 (AMD) systems, Read cache is not available.  For more details, please read the Note in purple on this page.  https://docs.microsoft.com/en-us/azure/virtual-machines/lsv2-series

MargaretC
SAS Employee

If you are looking to use the MS Azure Lsv2 series instances, please read on:

 

If you are planning on using SAS 9.4m6 and earlier versions of SAS 9.4, on the new AMD EPYC 7001 Series Processors (https://www.amd.com/en/products/epyc-7000-series) you will need to set the Linux environment variable (MKL_DEBUG_CPU_TYPE) to the value of 5. 

 

Here is the command to do this:    export MKL_DEBUG_CPU_TYPE=5

 

Please note that the above environment variable does not need to be set for SAS Viya 3.5, but setting it will not hurt the usage of SAS Viya.

MargaretC
SAS Employee

Additional information to share with you if you plan to move to MS Azure, it is very important for you to use Azure proximity placement groups. This technology makes sure all components of the SAS infrastructure are close to one another within the Azure data center.

 

More details on this technology can be found here https://azure.microsoft.com/en-us/blog/introducing-proximity-placement-groups/

 

toshal
Calcite | Level 5

The Article is really a nice way to help fine tune the Azure Architecture , I have query about using the Same Availability Zone for the VM , which may limit the use of any Load balancer  in the System Design . If you have any use case in this regards , that will be really helpful 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 7829 views
  • 18 likes
  • 3 in conversation