Does It Matter Where the Various Components of Your SAS Infrastructure are Installed?
With today’s emphasis on keeping SAS applications available to end users around the clock, customers and IT Administrators are asking SAS if they can use different physical locations for SAS servers, SAS clients, and the data used for SAS applications. We have even been asked if the various servers in a SAS metadata cluster or the nodes in the SAS Grid can be in different physical locations. The answer to these questions is: It may technically function; however, geographically distributing components will greatly impact performance. This performance impact is most commonly seen when the compute servers and data are in separate locations, especially if the SAS applications are sequentially going through large volumes (100s of GB or more) of data.
Let’s review a SAS infrastructure my team was recently asked to resolve performance issues with:
In the example SAS 9.4 infrastructure illustrated above, the SAS clients are running on a Citrix system in New England and they access SAS servers running in the South. The data used for their analytics resides in a relational database in their Texas data center.
Can this work? Technically, yes. But the distribution has a performance cost. It means that when a SAS User in the Midwest wants to use a SAS client in NE to review a list of available SAS tables, or the columns in a SAS table, the request travels from the SAS client in NE to the SAS servers in the South, to the data center in Texas, back to the SAS servers in the South, and back to the SAS client in NE, where it is finally sent and displayed on the User’s monitor in the Midwest. All the overhead from the long distance traveling of requests and data causes slow UI performance.
In addition, because all the source data for the SAS servers is in a relational database in Texas, the data must travel across a WAN to the SAS servers every time a SAS job needs to access it or write out permanent results/data. Again, data traveling over a long distance every time it is needed can severely impact performance.
If the SAS customer wants their SAS applications to perform optimally, they need to have all of their SAS clients, SAS servers (compute, mid-tier and metadata), authentication services, and source data files as closely located as possible. The diagram below illustrates this.
The above placement of SAS infrastructure components applies to SAS 9.4, SAS 9.4 Grid Manager, and SAS Viya – whether on-premise or in a cloud offering (public or private). Each of the SAS infrastructures mentioned has many SAS servers.
Let’s drill down into what that means to keep all the SAS 9.4 infrastructure components close to one another: To optimize the communication and data paths between components – and thereby to optimize performance – we need to make sure we keep all the SAS 9.4 servers (compute, mid-tier and metadata) physically/geographically close.
Here are two specific examples:
When setting up a cluster of SAS metadata servers, it is important to make sure they are all in the same data center. Placing some metadata nodes in the US and others in Europe, for example, with a goal of achieving a high availability/disaster recovery deployment, can result in dramatically inferior performance. In a SAS Metadata cluster, only the master SAS Metadata server writes to the SAS Metadata repository (and won’t provide a disaster recovery solution anyway). If a SAS application connects to a SAS Metadata server in the UK that needs to read or write data to the SAS Metadata repository in the US, this will add considerable overhead to the operation. In cases like this, we are seeing that certain SAS tasks time out when the Metadata reads and writes do not happen quickly enough, causing the SAS job or user session that initiated the SAS task to fail.
Similar delays to those seen in 1) can happen when you distribute SAS Grid compute nodes across data centers in multiple continents while your shared file system is physically located in one of the data centers. When the data is physically located far away from the SAS Grid (or non-Grid) compute node, the process will use extra ‘wall clock’ time to retrieve that data.
Bottom-line: To ensure your SAS applications run as optimally as possible, it is important to keep all SAS clients, SAS servers (compute, mid-tier and metadata), authentication services, and source data files as closely located as possible. If, for some reason, this cannot happen, you must understand that the performance of your SAS applications will be degraded compared to what it would be if the all the components were physically/geographically close.
Hopefully a data needs assessment will help to determine what data is needed for SAS and they can do a shadow data lake closer to the users. If they are doing a lot of reading of operational data, maybe they should do scheduled pulls. Sounds like a fun project.