One of the anticipated updates for SAS Viya 3.4 is the reduction in the memory footprint. A deployment of a SAS Viya 3.3 order containing a majority of products uses approximately 75GB of memory before any tuning changes or usage of the system. This consumption raised some alerts with customers and as a result, SAS R&D has responded with changes that reduce the memory footprint in SAS Viya 3.4.
Since the memory required for a deployment may impact how services are distributed across machines, it would be helpful to capture information at the level of host groups. Therefore, in this blog we take a first look at the memory consumed by SAS Viya services, broken out by Ansible host group. To be clear, this blog is not about comparing memory usage of SAS Viya 3.3 to 3.4, but more about the breakdown of memory usage in a SAS Viya 3.4 deployment.
As you might imagine, the memory footprint of a SAS Viya deployment has implications when sizing machines and determining the topology. The GEL team regularly creates collections of test machines with a limited number of machine sizes, primarily with 16 and 32GB memory configurations. Therefore, it is important to understand the estimated memory footprint of services so that memory is not over-committed on any of the machines of a deployment. If one machine is overcommitted, it is likely that performance will suffer for the applications on that machine as a result of paging and swapping. So ideally we want to make the most efficient placement of services on each machine in order to minimize the number of machines we use.
This phenomenon is not limited to SAS’ testing environment. Customers who deploy on virtual or physical environments where there is a limited memory on each configured machine may need to divide the SAS Viya services to effectively use memory. To do this efficiently it will be beneficial to know the memory required for the services that will be placed on each machine. A customer could use trial and error to assign host groups to machines, but this could end up being very time-consuming.
The question of how to distribute services across machines originally surfaced in the early days of SAS Viya, but prior to SAS Viya 3.3 there was a smaller number of services in a deployment, and thus the memory footprint was smaller. When Viya 3.3 was released, it came with a significant increase in microservices, and consequently an increase in memory footprint. Although SAS Viya 3.4 has a modest increase in services, we’ll see shortly that the memory footprint has definitely shrunk.
The concept of host groups is likely a familiar topic for SAS Viya customers, but we’ll touch briefly on them here.
The unit of configuration used to manage placement of SAS Viya software components and services is an Ansible host group. Ansible host groups are defined within the Ansible inventory file. Host groups are labels that represent one or more software components and services to be deployed to specific machine(s). The number of host groups within the inventory file is variable and determined by the products in the order. Host groups within the inventory file are generated during playbook creation. Each service within a host group will have unique memory requirements. Therefore, ascertaining the memory required for the service(s) within a host group will help with placing the services on machines so that memory is not over-committed.
A host group within the inventory file is specified in brackets. For example, the following snippet from the inventory.ini file identifies the AdvancedAnalytics host group. It is followed by a nickname, deployTarget, which relates to a specific host machine where the components of the AdvancedAnalytics host group will be installed. Each host group in the inventory file is followed by one or more unique nicknames, which correspond to unique machines.
[AdvancedAnalytics]
deployTarget
If you need more information about host groups, they are described in the SAS Viya 3.4 Deployment Guide.
The key to capturing the amount of memory used by host group is attaining the mapping of services to host groups. If this mapping can be generated, then it increases the chances of capturing the metrics to calculate host group memory consumption.
Erwan Granger, a SAS colleague, made an attempt back in November 2017 to capture this information by creating a pre-production SAS Viya 3.3 order with all products and then deploying each host group on a dedicated, very small machine. He created a testbed in AWS with 37 machines. The purpose was two-fold:
His deployment was successful, and he was able to capture memory usage at the system level. And although it was a great idea to capture memory requirements without requiring a host group mapping, it obviously is not very useful.
After some recent digging, it was determined that one section of the generated playbook is the key to mapping most of the services to host groups. The group_vars directory within the playbook contains files that map to the host group names. Shown here is an abbreviated list of directories that match host groups in the inventory file.
total 348
-r--r--r--. 1 sas sas 3472 May 25 15:12 AdminServices
-r--r--r--. 1 sas sas 2884 May 25 15:12 AdvancedAnalytics
-r--r--r--. 1 sas sas 4954 May 25 15:12 all
-r--r--r--. 1 sas sas 2400 May 25 15:12 CASServices
-r--r--r--. 1 sas sas 2866 May 25 15:12 CognitiveComputingServices
-r--r--r--. 1 sas sas 5597 May 25 15:12 CommandLine
. . .
Within each of these files are the software components associated with that host group. For example, device-management is a software component within AdminServices.
device-management:
SERVICE_APP_NAME: deviceManagement
SERVICE_MEMORY_NEEDED: 512
SERVICE_PRODUCT_NAME: device-management
SERVICE_YUM_PACKAGE: sas-device-management
After some testing, it was determined that the value of SERVICE_PRODUCT_NAME could be used to resolve the service associated with that component. And as a result, it is possible to map the service to the host group, which is the file name in the group_vars directory.
A search in /etc/init.d for the string “device” reveals the following:
[root@intviya01 init.d]# ll *device*
-rwxr-xr-x. 1 sas sas 6581 May 17 20:56 sas-viya-device-management-default
Notice that the value of the field for SERVICE_PRODUCT_NAME identified earlier matches the third and fourth qualifier of the service name. And it turns out that this holds true for most services.
Based on this discovery, a script was written to scrape the value for this field, generate the service name, and associate the service name to the host group.
The files in sas_viya_playbook/group_vars contain many components, not all of which have an associated service. So it is necessary to use the SERVICE_PRODUCT_NAME value to resolve the service name and check to see if it exists in /etc/init.d.
The shell script mentioned earlier was written to step through each host group file, resolving a service name and checking to see if it is defined in /etc/init.d. Using a SAS software order that contained all but a few of the SAS Viya 3.4 products, this method resolved 155 out of 172 services. The remaining services were captured manually and appended to the generated list.
At this point, we have a mapping but still no memory metrics. There was still some work to do to capture memory usage.
How can we associate SAS Viya services with running processes? It turns out that it is easier than you may think. When a service is started, it captures the PID (process ID) in a file and stores it in /var/run/sas under the service name with a final qualifier of “.pid”. Shown here is the service identified earlier:
[root@intviya01 sas]# pwd
/var/run/sas
[root@intviya01 sas]# ll *device*
-rw-r--r--. 1 sas sas 6 May 31 00:27 sas-viya-device-management-default.pid
[root@intviya01 sas]# cat sas-viya-device-management-default.pid
25458
Now that we have the PID for the service, we can use the “ps” command to check the memory usage. The following command returns the memory used (the RSS, or resident set size) in kilobytes. In this case, the memory used was approximately 452MB.
[root@intviya01 sas]# PID=25458;ps -p $PID -o rss=
452524
Finally, to generate the big picture of memory usage, this data is written to a file. The fields that are written are host group, service name, resident set size, and virtual memory allocated. Once the file is created, it can be dropped into Excel and a Pivot table created (I know, I know. . .).
Before we look at the details of the spreadsheet, let’s look at the system-level view of memory using the free command:
[root@intviya01 tmp]# free -m
total used free shared buff/cache available
Mem: 96520 67456 4428 1237 24636 26810
Swap: 8063 0 8063
It appears that approximately 67GB out of 96GB of memory are used.
Since we now know how much memory is in use from a system perspective, let’s look at the memory used by host group. Here is a snippet of the summary by host group:
. . . .
If you noticed that the sum of the memory used as reported by the ps command is greater than the system level memory used, I believe this is because the amount of memory used reported by the ps command includes shared memory. The free command indicated that there were 1237MB of shared memory. So if that amount of memory is removed from the total shown by host group, then the totals are fairly close. And keep in mind that we are looking for ballpark numbers for planning. These values will change (in most cases, they will increase) once users are active on the system. (See additional caveats below.)
…and here is a snippet showing the first two host groups expanded:
From this view you can clearly see each service and how much memory it uses. The full spreadsheet is available below.
There are several caveats that need to be mentioned:
One of the SAS Viya 3.3 test environments used by GEL for workshops has a similar deployment of products to the environment that was used for SAS Viya 3.4 testing.
So there was definitely a noticeable drop even though we aren’t comparing “apples with apples” and new functionality has been added. To gain more in-depth insight requires more research and time spent comparing like orders.
Deploying SAS Viya 3.4 may require spreading the services across more than one machine. Ensuring that the services are efficiently placed across machines requires understanding the memory profile for each host group. The script that I discussed in this blog is an initial attempt to capture data for that profile so that informed decisions can be made during the process of planning the deployment. The same script could be rerun to capture memory usage after numerous users are logged in and using the system. The delta could then be calculated to gain insight into which services are prone to memory growth.
See Excel spreadsheet attachment for a memory breakdown of host group/services.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.