A more updated version of this paper can be found here: https://communities.sas.com/t5/Administration-and-Deployment/Best-Practices-for-Using-Microsoft-Azur...
If you are considering moving your SAS applications to MS Azure, please review the information below to make sure you are choosing the correct MS Azure instances and have prepared them as optimally as possible.
As always, a detailed understanding of how SAS is being used, where the source data resides, given time constraints on running reports/jobs, etc., will help with decisions you make in choosing MS Azure instances and storage.
Let’s start off with the MS Azure instance types. This link brings you to the list of instance types.
We have compiled the following information to help guide your decisions. The memory optimized instances, Ev3 and Esv3 series in particular, tend to be the best for SAS. Let’s walk through important information found on the MS Azure Instance types link above.
Let’s talk about setting up the network.
To validate that you have Accelerated Network enabled, please run the following commands and ensure your output looks like the output on this web site.
Deploy on single VNET and Subnet resources specifically created for this deployment. Do not utilize any inspection, tracing features etc., on the VNET.
And finally, let’s discuss configuring the external Premium Storage. Like the instance types, there is a maximum IO throughput per Premium Disk. These values can be found on the “Throughput per disk” row of this table. Multiple Premium Disks, enough to meet or exceed the “Max uncached disk throughput IOPS/MBps” of an instance, should be attached the instance. These disks should be striped together to create a single file system that can then simultaneously utilize the full throughput across all the disks.
When setting up your storage, you will be prompted for setting a Storage Caching value. Please set ReadWrite for your operating system storage and ReadOnly for your SAS data disks.
To summarize the above, the following would be a good example configuration for a SAS compute node (for either SAS 9.4 or SAS Viya).
Standard_E32s_v3 with four P30 Premium Disks for a total of 4 TBs of persistent disk space. If more disk space is needed, then look at larger Premium Disks. Remember the maximum IO bandwidth to the E32 instance is 768 MB/sec. The internal 512 GB drive can be used for temporary file systems, but it cannot be made any larger.
Standard_L32v_v2 with three P30 Premium Disks for a total of 3 TBs of persistent disk space. If more disk space is needed, then look at larger Premium Disks Remember the maximum IO bandwidth to the L32 is 640 MB/sec. This instance has four 1.92 TB NVMe drives that can be used for temporary file systems.
In conclusion, there are many resource and configuration settings to check within MS Azure, in order to configure an instance to meet the needs of your SAS application. It is possible you may have to use an instance type with more cores than needed (with or without a constrained core count) in order to get the commensurate IO throughput required by your application.
UPDATE:
1) You are able to stripe together ephemeral storage. Sorry for the misinformation above
2) On the Lsv2 (AMD) systems, Read cache is not available. For more details, please read the Note in purple on this page. https://docs.microsoft.com/en-us/azure/virtual-machines/lsv2-series
If you are looking to use the MS Azure Lsv2 series instances, please read on:
If you are planning on using SAS 9.4m6 and earlier versions of SAS 9.4, on the new AMD EPYC 7001 Series Processors (https://www.amd.com/en/products/epyc-7000-series) you will need to set the Linux environment variable (MKL_DEBUG_CPU_TYPE) to the value of 5.
Here is the command to do this: export MKL_DEBUG_CPU_TYPE=5
Please note that the above environment variable does not need to be set for SAS Viya 3.5, but setting it will not hurt the usage of SAS Viya.
Additional information to share with you if you plan to move to MS Azure, it is very important for you to use Azure proximity placement groups. This technology makes sure all components of the SAS infrastructure are close to one another within the Azure data center.
More details on this technology can be found here https://azure.microsoft.com/en-us/blog/introducing-proximity-placement-groups/
The Article is really a nice way to help fine tune the Azure Architecture , I have query about using the Same Availability Zone for the VM , which may limit the use of any Load balancer in the System Design . If you have any use case in this regards , that will be really helpful
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.