BookmarkSubscribeRSS Feed

A preview of SAS Viya 3.4 memory usage by host group

Started ‎09-12-2019 by
Modified ‎09-12-2019 by
Views 4,078

One of the anticipated updates for SAS Viya 3.4 is the reduction in the memory footprint.  A deployment of a SAS Viya 3.3 order containing a majority of products uses approximately 75GB of memory before any tuning changes or usage of the system.  This consumption raised some alerts with customers and as a result, SAS R&D has responded with changes that reduce the memory footprint in SAS Viya 3.4.

 

Since the memory required for a deployment may impact how services are distributed across machines, it would be helpful to capture information at the level of host groups.  Therefore, in this blog we take a first look at the memory consumed by SAS Viya services, broken out by Ansible host group.  To be clear, this blog is not about comparing memory usage of SAS Viya 3.3 to 3.4, but more about the breakdown of memory usage in a SAS Viya 3.4 deployment.

 

The Topology Impact

As you might imagine, the memory footprint of a SAS Viya deployment has implications when sizing machines and determining the topology.  The GEL team regularly creates collections of test machines with a limited number of machine sizes, primarily with 16 and 32GB memory configurations. Therefore, it is important to understand the estimated memory footprint of services so that memory is not over-committed on any of the machines of a deployment.  If one machine is overcommitted, it is likely that performance will suffer for the applications on that machine as a result of paging and swapping.  So ideally we want to make the most efficient placement of services on each machine in order to minimize the number of machines we use.

 

This phenomenon is not limited to SAS’ testing environment. Customers who deploy on virtual or physical environments where there is a limited memory on each configured machine may need to divide the SAS Viya services to effectively use memory.  To do this efficiently it will be beneficial to know the memory required for the services that will be placed on each machine.  A customer could use trial and error to assign host groups to machines, but this could end up being very time-consuming. 

 

The question of how to distribute services across machines originally surfaced in the early days of SAS Viya, but prior to SAS Viya 3.3 there was a smaller number of services in a deployment, and thus the memory footprint was smaller.  When Viya 3.3 was released, it came with a significant increase in microservices, and consequently an increase in memory footprint.  Although SAS Viya 3.4 has a modest increase in services, we’ll see shortly that the memory footprint has definitely shrunk.

 

Host Groups

The concept of host groups is likely a familiar topic for SAS Viya customers, but we’ll touch briefly on them here.

 

The unit of configuration used to manage placement of SAS Viya software components and services is an Ansible host group.  Ansible host groups are defined within the Ansible inventory file.  Host groups are labels that represent one or more software components and services to be deployed to specific machine(s).  The number of host groups within the inventory file is variable and determined by the products in the order.  Host groups within the inventory file are generated during playbook creation.  Each service within a host group will have unique memory requirements. Therefore, ascertaining the memory required for the service(s) within a host group will help with placing the services on machines so that memory is not over-committed.

 

A host group within the inventory file is specified in brackets.  For example, the following snippet from the inventory.ini file identifies the AdvancedAnalytics host group.  It is followed by a nickname, deployTarget, which relates to a specific host machine where the components of the AdvancedAnalytics host group will be installed.  Each host group in the inventory file is followed by one or more unique nicknames, which correspond to unique machines.

 

[AdvancedAnalytics]

deployTarget

 

If you need more information about host groups, they are described in the SAS Viya 3.4 Deployment Guide.

 

Determining Memory Usage per Host Group

The key to capturing the amount of memory used by host group is attaining the mapping of services to host groups.  If this mapping can be generated, then it increases the chances of capturing the metrics to calculate host group memory consumption.

 

Erwan Granger, a SAS colleague, made an attempt back in November 2017 to capture this information by creating a pre-production SAS Viya 3.3 order with all products and then deploying each host group on a dedicated, very small machine. He created a testbed in AWS with 37 machines.  The purpose was two-fold:

  1. To see if it would actually deploy one host group per machine.
  2. To capture the amount of memory used by each host group. 

His deployment was successful, and he was able to capture memory usage at the system level.  And although it was a great idea to capture memory requirements without requiring a host group mapping, it obviously is not very useful.

 

After some recent digging, it was determined that one section of the generated playbook is the key to mapping most of the services to host groups.  The group_vars directory within the playbook contains files that map to the host group names. Shown here is an abbreviated list of directories that match host groups in the inventory file.

 

total 348
-r--r--r--. 1 sas sas  3472 May 25 15:12 AdminServices
-r--r--r--. 1 sas sas  2884 May 25 15:12 AdvancedAnalytics
-r--r--r--. 1 sas sas  4954 May 25 15:12 all
-r--r--r--. 1 sas sas  2400 May 25 15:12 CASServices
-r--r--r--. 1 sas sas  2866 May 25 15:12 CognitiveComputingServices
-r--r--r--. 1 sas sas  5597 May 25 15:12 CommandLine
. . .

 

 

Within each of these files are the software components associated with that host group.  For example, device-management is a software component within AdminServices.

 

   device-management:
      SERVICE_APP_NAME: deviceManagement
      SERVICE_MEMORY_NEEDED: 512
      SERVICE_PRODUCT_NAME: device-management
      SERVICE_YUM_PACKAGE: sas-device-management

 

After some testing, it was determined that the value of SERVICE_PRODUCT_NAME could be used to resolve the service associated with that component.  And as a result, it is possible to map the service to the host group, which is the file name in the group_vars directory.

A search in /etc/init.d for the string “device” reveals the following:

 

[root@intviya01 init.d]# ll *device*
-rwxr-xr-x. 1 sas sas 6581 May 17 20:56 sas-viya-device-management-default

 

Notice that the value of the field for SERVICE_PRODUCT_NAME identified earlier matches the third and fourth qualifier of the service name.  And it turns out that this holds true for most services.

Based on this discovery, a script was written to scrape the value for this field, generate the service name, and associate the service name to the host group.

 

The files in sas_viya_playbook/group_vars contain many components, not all of which have an associated service.  So it is necessary to use the SERVICE_PRODUCT_NAME value to resolve the service name and check to see if it exists in /etc/init.d.

 

The shell script mentioned earlier was written to step through each host group file, resolving a service name and checking to see if it is defined in /etc/init.d.  Using a SAS software order that contained all but a few of the SAS Viya 3.4 products, this method resolved 155 out of 172 services.  The remaining services were captured manually and appended to the generated list.

 

At this point, we have a mapping but still no memory metrics.  There was still some work to do to capture memory usage.

 

How can we associate SAS Viya services with running processes?  It turns out that it is easier than you may think.  When a service is started, it captures the PID (process ID) in a file and stores it in /var/run/sas under the service name with a final qualifier of “.pid”.  Shown here is the service identified earlier:

 

[root@intviya01 sas]# pwd
/var/run/sas
[root@intviya01 sas]# ll *device*
-rw-r--r--. 1 sas sas 6 May 31 00:27 sas-viya-device-management-default.pid
[root@intviya01 sas]# cat sas-viya-device-management-default.pid
25458

 

Now that we have the PID for the service, we can use the “ps” command to check the memory usage.  The following command returns the memory used (the RSS, or resident set size) in kilobytes.  In this case, the memory used was approximately 452MB.

 

[root@intviya01 sas]# PID=25458;ps -p $PID -o rss=
452524

 

Finally, to generate the big picture of memory usage, this data is written to a file.  The fields that are written are host group, service name, resident set size, and virtual memory allocated.   Once the file is created, it can be dropped into Excel and a Pivot table created (I know, I know. . .).

Before we look at the details of the spreadsheet, let’s look at the system-level view of memory using the free command:

 

[root@intviya01 tmp]# free -m
              total        used        free      shared  buff/cache   available
Mem:          96520       67456        4428        1237       24636       26810
Swap:          8063           0        8063

 

It appears that approximately 67GB out of 96GB of memory are used.

Since we now know how much memory is in use from a system perspective, let’s look at the memory used by host group.  Here is a snippet of the summary by host group:


mdt_41_viya34_1.png

                                                 . . . .

 

mdt_41_viya34_2.png

 

If you noticed that the sum of the memory used as reported by the ps command is greater than the system level memory used, I believe this is because the amount of memory used reported by the ps command includes shared memory.  The free command indicated that there were 1237MB of shared memory.  So if that amount of memory is removed from the total shown by host group, then the totals are fairly close.  And keep in mind that we are looking for ballpark numbers for planning.  These values will change (in most cases, they will increase) once users are active on the system.  (See additional caveats below.)

 

…and here is a snippet showing the first two host groups expanded:

 

mdt_41_viya34_3.png

 

From this view you can clearly see each service and how much memory it uses.  The full spreadsheet is available below.

 

Caveats

There are several caveats that need to be mentioned:

  • Memory usage for RabbitMQ and pgpool were captured in a manner different that the rest of the services.
    • There are over 1000 pgpool processes created when SAS Viya starts. These are not captured as part of the services’ memory profile.  Values for these processes were captured based on the process name and then summed.  The pgpool processes are driven by the connection pool settings for the SAS Infrastructure Data Server.
    • Because there is no pid file for RabbitMQ, all processes running under the rabbitmq userid were captured.
  • As discussed earlier, not all services mapped properly, requiring manual assignment. The file containing data for those services is available below.
  • There is no simple way to determine if all PIDs related to the deployment were captured.
  • This script currently only works on a single-machine deployment. However, it could be modified to work across machines (which requires time).
  • The script is quick-and-dirty code used to capture basic data and could use refinement. But I believe it provides enough value to be used for planning a deployment of services across machines at this point.

 

Quick Note about SAS Viya 3.3 vs SAS Viya 3.4

One of the SAS Viya 3.3 test environments used by GEL for workshops has a similar deployment of products to the environment that was used for SAS Viya 3.4 testing.

  • SAS Viya 3.3 used approximately 74GB of memory
  • SAS Viya 3.4 used approximately 65GB of memory for a newly-started idle system; however, this included over 3GB used for SAS Event Stream Processing services, which were not in the SAS Viya 3.3 order. It also included new host groups, such as StudioViya).

So there was definitely a noticeable drop even though we aren’t comparing “apples with apples” and new functionality has been added.  To gain more in-depth insight requires more research and time spent comparing like orders.

 

Final Thoughts

Deploying SAS Viya 3.4 may require spreading the services across more than one machine. Ensuring that the services are efficiently placed across machines requires understanding the memory profile for each host group.  The script that I discussed in this blog is an initial attempt to capture data for that profile so that informed decisions can be made during the process of planning the deployment.  The same script could be rerun to capture memory usage after numerous users are logged in and using the system.  The delta could then be calculated to gain insight into which services are prone to memory growth.

 

See Excel spreadsheet attachment for a memory breakdown of host group/services. 

Version history
Last update:
‎09-12-2019 05:44 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags