BookmarkSubscribeRSS Feed

SAS Viya 3.5 Multi-Nic deployments Observations with 2 Real Life Examples

Started ‎03-30-2020 by
Modified ‎03-30-2020 by
Views 4,215

In this post, I’d like to discuss some new features introduced with Viya 3.5. I briefly mentioned in a previous SAS Communities article.

 

This new capability is great as it gives more flexibility to handle complex network configurations at customer sites. However, it requires a careful planning, a good understanding of the concepts and an understanding of the technical requirements/goals.

 

Rather than paraphrasing the official documentation, the idea for this post is to tell the story of two case studies and the challenges they presented and how they were tackled.

 

Even though I’ll do my best to explain it as clearly as possible, it is a complex topic. So, if you plan to read the to read it end-end, grab yourself a good cup of coffee or tea and make sure you won’t be disturbed for a little while.

 

An Example of Multi-NIC Environment (Data Appliance)


What is a multi-NIC machine? Why? How do we check that?

 

multiNIC1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Multi-NIC stands for “Multiple Network Interface Cards”.

 

Increasingly, servers are coming with multiple network interfaces. The network interfaces could be there to support virtualized environments or in order to have different isolated networks for security, administration and/or performance reasons. Sometimes the term “multi-homed” environment is also used to refer to this configuration.

 

The network interfaces could be virtual or physical, but they generally come with a different IP address, that is usually associated to a hostname. So, we end up with a single machine having multiple IP addresses and hostnames. How can Viya software be configured to work within this setup?

 

It is always better to explain a concept using an example, so let’s look at a DA (Data Appliance). In this DA, we have a collection of 5 machines where there are several different networks used for different purposes.

 

Customers with large/big data, use data appliances to hold data and perform analytics on such platforms. They typically come with a version of Linux, which makes a good fit more for Viya.

 

In the DA we discuss, there are 3 different types of network involved in the machines network configuration and to consider (Infiniband, 10GbE and 1 GbE).


Logging on to the DA we can run a command like ifconfig or ip and check the content of the /etc/hosts file to see that.

 

bondeth0 Link encap:Ethernet HWaddr 12:6A:0A:25:98:97
               inet addr:12.35.250.251 Bcast:12.35.255.255              Mask:255.255.0.0
bondib0   Link encap:Infiniband HWaddr 80:00:05:0A:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00                 inet addr:192.168.10.152          Bcast:192.168.11.255 Mask:255.255.0.0 eth0      Link encap:Ethernet  HWaddr 00:10:E0:B4:89:82                 inet addr:173.16.7.136 Bcast:173.16.255.255              Mask:255.255.0.0 eth8      Link encap:Ethernet  HWaddr 12:69:0A:25:98:97 eth9      Link encap:Ethernet  HWaddr 12:6A:0A:25:98:97 ib0       Link encap:Infiniband HWaddr 80:00:05:0A:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 ib1       Link encap:Infiniband HWaddr 80:00:05:0B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 lo        Link encap:Local Loopback           inet addr:127.0.0.1   Mask:255.0.0.0 datapp1node01.sci.example.com>

Here’s what we see when look at the /etc/hosts file.

 

127.0.0.1	localhost.localdomain localhost
::1	localhost6.localdomain6 localhost6
173.16.6.136 datapp1node01-adm.sci.example.com datapp1node01-adm
173.16.6.142 datapp1node01-ilom.sci.example.com datapp1node01-ilom
173.16.6.137 datapp1node02-adm.sci.example.com datapp1node02-adm
173.16.6.143 datapp1node02-ilom.sci.example.com datapp1node02-ilom
173.16.6.138 datapp1node03-adm.sci.example.com datapp1node03-adm
173.16.6.144 datapp1node03-ilom.sci.example.com datapp1node03-ilom
173.16.6.139 datapp1node04-adm.sci.example.com datapp1node04-adm
173.16.6.145 datapp1node04-ilom.sci.example.com datapp1node04-ilom
173.16.6.140 datapp1node05-adm.sci.example.com datapp1node05-adm
173.16.6.146 datapp1node05-ilom.sci.example.com datapp1node05-ilom
173.16.6.141 datapp1node06-adm.sci.example.com datapp1node06-adm
173.16.6.147 datapp1node06-ilom.sci.example.com datapp1node06-ilom
...
...
### DataApp Public Hostnames

192.168.10.152  datapp1node01.sci.example.com datapp1node01-priv.sci.example.com datapp1node01-master.sci.example.com datapp1node01 datapp1node01-priv datapp1node01-master
192.168.10.153  datapp1node02.sci.example.com datapp1node02-priv.sci.example.com datapp1node02 datapp1node01-priv
192.168.10.154  datapp1node03.sci.example.com datapp1node03-priv.sci.example.com datapp1node03 datapp1node03-priv
192.168.10.155  datapp1node04.sci.example.com datapp1node04-priv.sci.example.com datapp1node04 datapp1node04-priv
192.168.10.156  datapp1node05.sci.example.com datapp1node05-priv.sci.example.com datapp1node05 datapp1node05-priv
192.168.10.157  datapp1node06.sci.example.com datapp1node06-priv.sci.example.com datapp1node06 datapp1node06-priv
...

 

We see a lot of network interfaces because DA is using Network bonding (combination of network interfaces on one host for redundancy and/or increased throughput), but from the outputs and DA documentation we can determine that we have:

  • A 1GbE administration network (eth0).
  • A 10GbE network for external access and communication (bondeth0).
  • A superfast Infiniband backplane network attached to the DA nodes for internal communications (bondib0).

As a reminder, 1 and 10 Gigabit Ethernet technologies can, in theory provide respectively a speed of 1 and 10 Gigabit per second whereas a 4 lanes QDR Infiniband standard could provide up to 40 Gb/s throughput.

Each network has different IPs and hostnames.


Looking at the previous outputs and running commands like hostname -I and hostname -A will help you to determine which IP/hostname relates to our network interfaces.

 

multinic_2.png

 

Here is what we have in our DA environment :

datapp1node01.sci.example.com> hostname -I
173.16.6.136 192.168.10.152 192.168.10.100 12.37.250.251
datapp1node01.sci.example.com> hostname -A
datapp1node01-adm.sci.example.com datapp1node01-adm datapp1node01.sci.example.com datapp1node01.sci.example.com
datapp1node01.sci.example.com>

 

We notice that the hostname -A command returns only two different hostnames when we have three different kind or IP addresses corresponding to different network interfaces and subnets (173.16.6.136/16, 192.168.10.152/22, 12.35.150.151/16).

 

This is a very important factor!. It is because the same datapp1node01.sci.example.com sci.example.com hostname is resolved differently if we are inside or outside of the DA collection. On each of the DA nodes the /etc/hosts file is used to resolve the hostname to the private IP address.


But if we are outside of the DA machines, we rely on the network DNS.

 

Microsoft Windows (Version 10.0.16299.1565]
(c) 2017 Microsoft Corporation. All rights reserved.

C:\Users\RaphP\ping datapp1node02.sci.example.com 
Pinging datapp1node02.sci.example.com [12.35.150.152] with 32 bytes of data:
Reply from 12.35.150.152: bytes=32 time=118ms TTL=61
Reply from 12.35.150.152: bytes=32 time=118ms TTL=61
Reply from 12.35.150.152: bytes=32 time=118ms TTL=61

Ping statistics for 12.35.150.152
	Packets : Sent = 3, Received = 3, Lost = 0 (0% loss),
Approximate round trip times in mill-seconds:
	Minimum = 117ms, Maximum = 118ms, Average = 117ms

         

 

With the help of the ping command we have confirmed the hostname/IP address associations. We can now draw a small table to describe our multi-NIC configuration for the 5 DA machines.

 

Network Interface

Type

Associated IP Address

Associated hostnames

eth0

1 GbE Admin access

173.16.6.136
173.16.6.137
173.16.6.138
173.16.6.139
173.16.6.140

datapp1node01-adm.sci.example.com  
datapp1node02-adm.sci.example.com  datapp1node03-adm.sci.example.com  datapp1node04-adm.sci.example.com  datapp1node05-adm.sci.example.com  

bondeth0

10 GbE Client access

12.35.150.151
12.35.150.152
12.35.150.153
12.35.150.154
12.35.150.155

From the outside:
datapp1node01.sci.example.com

datapp1node02.sci.example.com datapp1node03.sci.example.com datapp1node04.sci.example.com datapp1node05.sci.example.com

bondib0

Superfast InfiniBand network for internal communications

192.168.10.152
192.168.10.153
192.168.10.154
192.168.10.155
192.168.10.156

From the DA nodes:
datapp1node01.sci.example.com

datapp1node02.sci.example.com datapp1node03.sci.example.com datapp1node04.sci.example.com datapp1node05.sci.example.com

 

It is recommended to draw a similar table when you are facing such a multi-NIC situation at the your Viya site. This single reference point can help prevent confusion. It will be helpful in the Viya pre-configuration phase.

 

A New Network Configuration Paradigm in Viya

 

Here is a summary slide for the networking options introduced with the release of Viya 3.5.

multinic_3.png

 

The details of each possible parameter and way to implement them is available in the official Deployment Guide.

 

Also one very cool thing is that : rather than using hard coded IP addresses (SAS_BIND_ADDR or SAS_EXTERNAL_BIND_ADDR) to define the internal and external services binding, you can use “…_IF” variables (such as SAS_BIND_ADDR_IF or SAS_EXTERNAL_BIND_ADDR_IF) set to a network interface name (like “eth0” or “eth1”).

 

Using the “…_IF” variables can in some (but not necessarily all) use cases, at host machine boot time, see the services pick up the network address of the host corresponding to a specific network interface. This may prove beneficial if dynamic IP addressing is present.

 

If you are preparing for a Viya configuration, I strongly encourage you to read the official documentation and a related SAS Communities article by my colleague @RobCollum 

 

The First Case Study

 

With the DA case study, which has three network interfaces, still fresh in our mind, let’s see how we would prepare the deployment of SAS Viya 3.5 in such environment. FYI, The DA infrastructure discussed here is used by SAS teams for internal technical exploration.

 

Based on the network interfaces table above, here is the first configuration we originally used for the first DA node (datapp1node01.yml):

---

network_conf:

  SAS_HOSTNAME: datapp1node01.sci.example.com #[internal hostname]

  SAS_BIND_ADDR: 192.168.10.152 #[internal network interface Infiniband]

  SAS_SAN_DNS: " datapp1node01.sci.example.com localhost" #[certificate DNS field]

  SAS_SAN_IP: "127.0.0.1 192.168.10.152 192.168.10.100 12.35.150.151" #[certificate IP field]

  SAS_EXTERNAL_HOSTNAME: datapp1node01.sci.example.com #[external

 hostname]

  SAS_EXTERNAL_BIND_ADDR_IF: "bondeth0" #[external network interface]

 

We can see that we want to use the private InfiniBand network IP address for the Viya internal communications (192.168.10.152), while the client network (bondeth0) should be used for the external access.

 

We prepared a similar content (adjusting IP address and hostname) for all the nodes in the datapp1node0<n>.yml files.

The deployment of Viya 3.5 using these settings was successful!

 

Everything was working well… except that CAS was not running…which is pretty critical!

😊

Checking the CAS controller log (caslaunch_default_controller_daemon.log) we noticed this message :


ERROR: The TCP/IP tcpSockConnect support routine failed with error 111 (The connection was refused.).

ERROR: Failed to connect to host datapp1node01.sci.example.com ', port 5570.

2019-12-17T19:34:47 datapp1node01.sci.example.com:0: ...sas/viya/home/SASFoundation/misc/casluaclnt/lua/swat.lua:193: Could not connect to
'..xfhqAXbXPJwTtIKs.' on port 5570.

 

After a CAS restart the same error message appeared again…

 

Can you see what was the issue was?

 

Well…when we think about it, the error message makes perfect sense.

 

As we can see below, while the CAS nodes try to access the CAS service using hostnames (defined in /opt/sas/viya/config/etc/cas/default/cas.hosts) that resolve to a 192.168.x.x IP address, the CAS service (as instructed via the  SAS_EXTERNAL_BIND_ADDR_IF parameter) is binding port 5570 on the external network  bondeth0 (IP Address: 10.37.150.151).

 

[root@ datapp1node01 default]# ping datapp1node01.sci.example.com
PING datapp1node01.sci.example.com (192.168.10.152) 56(84) bytes of data.

64 bytes from datapp1node01.sci.example.com (192.168.10.152): icmp_seq=1 ttl=64 time=0.024 ms

64 bytes from datapp1node01.sci.example.com (192.168.10.152): icmp_seq=2 ttl=64 time=0.025 ms

^C

— datapp1node01.sci.example.com ping statistics —

4 packets transmitted, 4 received, 0% packet loss, time 3311ms

rtt min/avg/max/mdev = 0.022/0.023/0.025/0.003 ms

[root@ datapp1node01 default]# netstat -anp | grep 5570
tcp        0      0 10.37.150.151:5570          0.0.0.0:*       LISTEN      12918/cas

 

Remember that in this specific configuration, the same hostname: datapp1node01.sci.example.com  is used for the internal hostname (resolved locally through /etc/hosts) and the external hostname (resolved by the DNS) – this is the reason why our initial configuration was not working !

 

There are three approaches outlined below, which could be considered to useful in this situation. However the first approach Is generally preferred/recommended in many situations.

 

  • In case we’d like to have the CAS front door service listening on all network interfaces, the recommended/default option  would be to set the SAS_EXTERNAL_BIND_ADDR variable to 0.0.0.0 . That way the CASL port (5570) would be reachable whatever network interface (admin, ethernet, or Infiniband) is used :
[cloud-user@intcas01 ~]$ sudo netstat -anp | grep cas
tcp        0      0 192.168.2.1:43806       0.0….0.0:*               LISTEN      22893/cas
tcp        0      0 0.0.0.0:5570            0.0.0.0:*               LISTEN      22893/cas
tcp        0      0 127.0.0.1:41222         0.0.0.0:*               LISTEN      22893/cas
tcp        0      0 192.168.2.1:43806       192.168.2.2:35838       ESTABLISHED 22893/cas
tcp        0      0 192.168.2.1:43806       192.168.2.2:35840       ESTABLISHED 22893/cas

 

  • Removing all together the SAS_EXTERNAL_ parameters in the network.conf files and redeploy Viya. In such case, CAS can start but the CAS service binds port 5570 to the IP address corresponding to the SAS_BIND_ADDR value (and therefore can only accept connections from the inside – like from another DA node). This may not be appropriate when we have client applications on a different subnet trying to connect to CAS e.g. 3rd party programming interfaces.
  • Another and probably better option could be to use one of the internal hostname aliases defined in the /etc/hosts files to identify uniquely the internal hostname (without having the name conflict with the external hostname). In our use case with the DA there was a uniquely defined internal name i.e.  datapp1node01-priv.sci.example.com

example for the first machine (datapp1) :

 

---

network_conf:

  SAS_HOSTNAME: datapp1node01-priv.sci.example.com #[internal hostname]

  SAS_BIND_ADDR: 192.168.10.152 #[internal network interface Infiniband]

  SAS_SAN_DNS: " datapp1node01.sci.example.com localhost datapp1node01-priv.sci.example.com " #[certificate DNS field]

  SAS_SAN_IP: "127.0.0.1 192.168.10.152 192.168.10.100 12.35.150.151" #[certificate IP field]

  SAS_EXTERNAL_HOSTNAME: datapp1node01.sci.example.com #[external hostname]

  SAS_EXTERNAL_BIND_ADDR_IF: "bondeth0 #[external network interface]

 

That way we could keep CAS listening on the external network interface and accept connections from CAS clients such as 3rd party programming interfaces.

 

The Second Case Study

 

In this second case study, the topology is pretty simple: 1 host machine with CAS and 1 host machine with the Viya services layer (microservices and Infrastructure servers) and SPRE.

 

In the lead up to the installation it was noted that each of the customer’s host machines had 2 NICs (internal and external).

 

The diagram below shows what the customer wanted to achieve:

 

rp_case2_1.png

 

Here is the table corresponding to this specific customer situation, showing the network interfaces, IP addresses and hostnames corresponding to the 2 Viya servers.

 

Network Interface

Type

Associated IP Address

Associated hostnames

eth1

10 GbE Internal network

192.168.254.100
192.168.254.101

int2005.gs.example.com

int2008.gs.example.com

eth0

External Client access

123.123.1.190

123.123.1.191

ext71.gs.example.com

ext76.gs.example.com

 

It’s recommended to document as much detail regarding the networking configuration as possible to avoid any misunderstanding.

From this information we prepared the network.conf files as below (first the first machine):

---

network_conf:

  SAS_HOSTNAME: int2005.gs.example.com #[internal hostname]

  SAS_BIND_ADDR_IF: eth1 #[internal network interface 10GB]

  SAS_SAN_DNS: " int2005.gs.example.com ext71.gs.example.com localhost" #[internal hostname, external hostname, localhost]

  SAS_SAN_IP: " 192.168.254.100 123.123.1.190 127.0.0.1" #[internal IP, external IP, 127.0.0.1]

  SAS_EXTERNAL_HOSTNAME: ext71.gs.example.com #[external hostname]

  SAS_EXTERNAL_BIND_ADDR_IF: eth0 #[external network interface 1GB]

 

Once again, the deployment was successful but there was an issue: the SASLogon microservice was failing showing errors message related to the TLS certificates :

 

Certificate for <...> doesn't match any of the subject alternative names [...]

 

After a bit of digging, and running some openssl commands, we discovered that on the machine where the Viya services and infrastructure servers (int2005) had been deployed, the Apache httpd had been installed with the certificates already configured.

 

But unfortunately, the certificates SAN (Subject Alternative Name) did NOT include the internal name (see below an extract).

 

Hostname :CN= ext71.gs.example.com

SAN:     X509v3 Subject Alternative Name:

         DNS: ext71.gs.example.com, DNS: www.ext71.gs.example.com

 

There was no reference to the internal int2005.gs.example.com name in the certificate.

 

Now the SASLogon error message was making sense…the microservices were not able to contact the apache HTTPD server using the internal hostname because the HTTPD TLS certificates was only valid for the external hostname.

 

So, at this point, there was two possible solutions:

  • Either have the admin add the internal hostnames in the certificate (because internal services connect to the same HTTP server to communicate altogether)
  • Use a front-end proxy where the custom cert can be used and leave the default certs in our own Viya Httpd server (see  this post from @EdoardoRiva for instructions).

Finally, after some discussions, the customer’s system administrator was able to add the internal hostname in the TLS certificate SAN.

 

After running the deployment, a second time with the updated TLS certificates in place, everything was working as expected in the Viya environment 

 

Limits and Potential

 

While the new network configuration options provide a lot of flexibility, it’s important to note the caveats and provisos:

  • Currently there no automated checks for validating the network.conf files
    • If you want to configure your multi-NIC deployment for 10 machines, then you must prepare 10 different YAML files and place them in the host_vars sub-folder of the playbook directory, adjusting the values for each of them. If you are not careful it is very easy to make a mistake. In our case study, during one of the attempts there was a typo: SAS_EXTERNAL_BIND_ADD_IF, instead of SAS_EXTERNAL_BIND_ADDR_IF (!)

      The installation ran fine, no errors, no warnings, nothing. But the Viya environment was not functioning as expected.

  • Configuration values not kept across multiple attempts
    • You might have noticed that when you run multiple deployments from the same playbook folder, the log file (deployment.log) and common configuration files (like inventory.ini and vars.yml) are kept in the snapshot folder of the playbook directory. However, the network configuration files used for a specific deployment are not kept. So, it is important to keep track of the values you use for a specific deployment in case of multiple attempts.
  • SAS_EXTERNAL_* variables only used for binding services
    • SAS_EXTERNAL_* variables allow us to configure the public Viay access points to bind on the specified network interfaces, but they don’t replace the CAS and HTTP services configuration variables used to deal with an external proxy or load-balancer. Since an additional 3rd party reverse proxy that’s been placed in front of Viya’s Apache HTTP proxy service is not managed by these Viya network.conf variables, you still need to use configuration parameters such as SERVICE_BASE_URL, CAS_VIRTUAL_HOST or sas.httpproxy.external.hostname/port (as explained this post by @EdoardoRiva. Refer to the official documentation.

 

Even with these caveats, this entirely new paradigm for network configuration with the support of network interfaces and Classless Inter-Domain Routing (CIDR) range addresses, is likely to help in some scenarios deal with planned or unplanned changes in the IP addresses. At this time (March 2020) it is important to note that the Viya deployment guides will still refer to the requirement for static IP addresses.

 

Conclusion

As we’ve seen, many factors (multiple IP addresses and hostname resolution, HTTPD TLS certificates, etc…) might impact the way a multi-NIC Viya deployment should be prepared. Make sure you have a good understanding of them.

 

Hopefully the 2 case studies provided, give an idea of the challenges you may face and how the new networking configuration options can be used to navigate them successfully.

 

A “Big Thanks” to my colleagues at SAS who helped in their expertise and interactions to provide the content of this post.

 

 

 

Version history
Last update:
‎03-30-2020 11:11 AM
Updated by:
Contributors

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags