In this post, I’d like to discuss some new features introduced with Viya 3.5. I briefly mentioned in a previous SAS Communities article.
This new capability is great as it gives more flexibility to handle complex network configurations at customer sites. However, it requires a careful planning, a good understanding of the concepts and an understanding of the technical requirements/goals.
Rather than paraphrasing the official documentation, the idea for this post is to tell the story of two case studies and the challenges they presented and how they were tackled.
Even though I’ll do my best to explain it as clearly as possible, it is a complex topic. So, if you plan to read the to read it end-end, grab yourself a good cup of coffee or tea and make sure you won’t be disturbed for a little while.
What is a multi-NIC machine? Why? How do we check that?
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Multi-NIC stands for “Multiple Network Interface Cards”.
Increasingly, servers are coming with multiple network interfaces. The network interfaces could be there to support virtualized environments or in order to have different isolated networks for security, administration and/or performance reasons. Sometimes the term “multi-homed” environment is also used to refer to this configuration.
The network interfaces could be virtual or physical, but they generally come with a different IP address, that is usually associated to a hostname. So, we end up with a single machine having multiple IP addresses and hostnames. How can Viya software be configured to work within this setup?
It is always better to explain a concept using an example, so let’s look at a DA (Data Appliance). In this DA, we have a collection of 5 machines where there are several different networks used for different purposes.
Customers with large/big data, use data appliances to hold data and perform analytics on such platforms. They typically come with a version of Linux, which makes a good fit more for Viya.
In the DA we discuss, there are 3 different types of network involved in the machines network configuration and to consider (Infiniband, 10GbE and 1 GbE).
Logging on to the DA we can run a command like ifconfig or ip and check the content of the /etc/hosts file to see that.
bondeth0 Link encap:Ethernet HWaddr 12:6A:0A:25:98:97
inet addr:12.35.250.251 Bcast:12.35.255.255 Mask:255.255.0.0
bondib0 Link encap:Infiniband HWaddr 80:00:05:0A:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.10.152 Bcast:192.168.11.255 Mask:255.255.0.0
eth0 Link encap:Ethernet HWaddr 00:10:E0:B4:89:82
inet addr:173.16.7.136 Bcast:173.16.255.255 Mask:255.255.0.0
eth8 Link encap:Ethernet HWaddr 12:69:0A:25:98:97
eth9 Link encap:Ethernet HWaddr 12:6A:0A:25:98:97
ib0 Link encap:Infiniband HWaddr 80:00:05:0A:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
ib1 Link encap:Infiniband HWaddr 80:00:05:0B:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
datapp1node01.sci.example.com>
Here’s what we see when look at the /etc/hosts file.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
173.16.6.136 datapp1node01-adm.sci.example.com datapp1node01-adm
173.16.6.142 datapp1node01-ilom.sci.example.com datapp1node01-ilom
173.16.6.137 datapp1node02-adm.sci.example.com datapp1node02-adm
173.16.6.143 datapp1node02-ilom.sci.example.com datapp1node02-ilom
173.16.6.138 datapp1node03-adm.sci.example.com datapp1node03-adm
173.16.6.144 datapp1node03-ilom.sci.example.com datapp1node03-ilom
173.16.6.139 datapp1node04-adm.sci.example.com datapp1node04-adm
173.16.6.145 datapp1node04-ilom.sci.example.com datapp1node04-ilom
173.16.6.140 datapp1node05-adm.sci.example.com datapp1node05-adm
173.16.6.146 datapp1node05-ilom.sci.example.com datapp1node05-ilom
173.16.6.141 datapp1node06-adm.sci.example.com datapp1node06-adm
173.16.6.147 datapp1node06-ilom.sci.example.com datapp1node06-ilom
...
...
### DataApp Public Hostnames
192.168.10.152 datapp1node01.sci.example.com datapp1node01-priv.sci.example.com datapp1node01-master.sci.example.com datapp1node01 datapp1node01-priv datapp1node01-master
192.168.10.153 datapp1node02.sci.example.com datapp1node02-priv.sci.example.com datapp1node02 datapp1node01-priv
192.168.10.154 datapp1node03.sci.example.com datapp1node03-priv.sci.example.com datapp1node03 datapp1node03-priv
192.168.10.155 datapp1node04.sci.example.com datapp1node04-priv.sci.example.com datapp1node04 datapp1node04-priv
192.168.10.156 datapp1node05.sci.example.com datapp1node05-priv.sci.example.com datapp1node05 datapp1node05-priv
192.168.10.157 datapp1node06.sci.example.com datapp1node06-priv.sci.example.com datapp1node06 datapp1node06-priv
...
We see a lot of network interfaces because DA is using Network bonding (combination of network interfaces on one host for redundancy and/or increased throughput), but from the outputs and DA documentation we can determine that we have:
As a reminder, 1 and 10 Gigabit Ethernet technologies can, in theory provide respectively a speed of 1 and 10 Gigabit per second whereas a 4 lanes QDR Infiniband standard could provide up to 40 Gb/s throughput.
Each network has different IPs and hostnames.
Looking at the previous outputs and running commands like hostname -I and hostname -A will help you to determine which IP/hostname relates to our network interfaces.
Here is what we have in our DA environment :
datapp1node01.sci.example.com> hostname -I
173.16.6.136 192.168.10.152 192.168.10.100 12.37.250.251
datapp1node01.sci.example.com> hostname -A
datapp1node01-adm.sci.example.com datapp1node01-adm datapp1node01.sci.example.com datapp1node01.sci.example.com
datapp1node01.sci.example.com>
We notice that the hostname -A command returns only two different hostnames when we have three different kind or IP addresses corresponding to different network interfaces and subnets (173.16.6.136/16, 192.168.10.152/22, 12.35.150.151/16).
This is a very important factor!. It is because the same datapp1node01.sci.example.com sci.example.com hostname is resolved differently if we are inside or outside of the DA collection. On each of the DA nodes the /etc/hosts file is used to resolve the hostname to the private IP address.
But if we are outside of the DA machines, we rely on the network DNS.
Microsoft Windows (Version 10.0.16299.1565]
(c) 2017 Microsoft Corporation. All rights reserved.
C:\Users\RaphP\ping datapp1node02.sci.example.com
Pinging datapp1node02.sci.example.com [12.35.150.152] with 32 bytes of data:
Reply from 12.35.150.152: bytes=32 time=118ms TTL=61
Reply from 12.35.150.152: bytes=32 time=118ms TTL=61
Reply from 12.35.150.152: bytes=32 time=118ms TTL=61
Ping statistics for 12.35.150.152
Packets : Sent = 3, Received = 3, Lost = 0 (0% loss),
Approximate round trip times in mill-seconds:
Minimum = 117ms, Maximum = 118ms, Average = 117ms
With the help of the ping command we have confirmed the hostname/IP address associations. We can now draw a small table to describe our multi-NIC configuration for the 5 DA machines.
Network Interface |
Type |
Associated IP Address |
Associated hostnames |
eth0 |
1 GbE Admin access |
173.16.6.136 |
datapp1node01-adm.sci.example.com |
bondeth0 |
10 GbE Client access |
12.35.150.151 |
From the outside: datapp1node02.sci.example.com datapp1node03.sci.example.com datapp1node04.sci.example.com datapp1node05.sci.example.com |
bondib0 |
Superfast InfiniBand network for internal communications |
192.168.10.152 |
From the DA nodes: datapp1node02.sci.example.com datapp1node03.sci.example.com datapp1node04.sci.example.com datapp1node05.sci.example.com |
It is recommended to draw a similar table when you are facing such a multi-NIC situation at the your Viya site. This single reference point can help prevent confusion. It will be helpful in the Viya pre-configuration phase.
Here is a summary slide for the networking options introduced with the release of Viya 3.5.
The details of each possible parameter and way to implement them is available in the official Deployment Guide.
Also one very cool thing is that : rather than using hard coded IP addresses (SAS_BIND_ADDR or SAS_EXTERNAL_BIND_ADDR) to define the internal and external services binding, you can use “…_IF” variables (such as SAS_BIND_ADDR_IF or SAS_EXTERNAL_BIND_ADDR_IF) set to a network interface name (like “eth0” or “eth1”).
Using the “…_IF” variables can in some (but not necessarily all) use cases, at host machine boot time, see the services pick up the network address of the host corresponding to a specific network interface. This may prove beneficial if dynamic IP addressing is present.
If you are preparing for a Viya configuration, I strongly encourage you to read the official documentation and a related SAS Communities article by my colleague @RobCollum
With the DA case study, which has three network interfaces, still fresh in our mind, let’s see how we would prepare the deployment of SAS Viya 3.5 in such environment. FYI, The DA infrastructure discussed here is used by SAS teams for internal technical exploration.
Based on the network interfaces table above, here is the first configuration we originally used for the first DA node (datapp1node01.yml):
---
network_conf:
SAS_HOSTNAME: datapp1node01.sci.example.com #[internal hostname]
SAS_BIND_ADDR: 192.168.10.152 #[internal network interface Infiniband]
SAS_SAN_DNS: " datapp1node01.sci.example.com localhost" #[certificate DNS field]
SAS_SAN_IP: "127.0.0.1 192.168.10.152 192.168.10.100 12.35.150.151" #[certificate IP field]
SAS_EXTERNAL_HOSTNAME: datapp1node01.sci.example.com #[external
hostname]
SAS_EXTERNAL_BIND_ADDR_IF: "bondeth0" #[external network interface]
We can see that we want to use the private InfiniBand network IP address for the Viya internal communications (192.168.10.152), while the client network (bondeth0) should be used for the external access.
We prepared a similar content (adjusting IP address and hostname) for all the nodes in the datapp1node0<n>.yml files.
The deployment of Viya 3.5 using these settings was successful!
Everything was working well… except that CAS was not running…which is pretty critical!
😊
Checking the CAS controller log (caslaunch_default_controller_daemon.log) we noticed this message :
ERROR: The TCP/IP tcpSockConnect support routine failed with error 111 (The connection was refused.).
ERROR: Failed to connect to host datapp1node01.sci.example.com ', port 5570.
2019-12-17T19:34:47 datapp1node01.sci.example.com:0: ...sas/viya/home/SASFoundation/misc/casluaclnt/lua/swat.lua:193: Could not connect to
'..xfhqAXbXPJwTtIKs.' on port 5570.
After a CAS restart the same error message appeared again…
Can you see what was the issue was?
Well…when we think about it, the error message makes perfect sense.
As we can see below, while the CAS nodes try to access the CAS service using hostnames (defined in /opt/sas/viya/config/etc/cas/default/cas.hosts) that resolve to a 192.168.x.x IP address, the CAS service (as instructed via the SAS_EXTERNAL_BIND_ADDR_IF parameter) is binding port 5570 on the external network bondeth0 (IP Address: 10.37.150.151).
[root@ datapp1node01 default]# ping datapp1node01.sci.example.com
PING datapp1node01.sci.example.com (192.168.10.152) 56(84) bytes of data.
64 bytes from datapp1node01.sci.example.com (192.168.10.152): icmp_seq=1 ttl=64 time=0.024 ms
64 bytes from datapp1node01.sci.example.com (192.168.10.152): icmp_seq=2 ttl=64 time=0.025 ms
^C
— datapp1node01.sci.example.com ping statistics —
4 packets transmitted, 4 received, 0% packet loss, time 3311ms
rtt min/avg/max/mdev = 0.022/0.023/0.025/0.003 ms
[root@ datapp1node01 default]# netstat -anp | grep 5570
tcp 0 0 10.37.150.151:5570 0.0.0.0:* LISTEN 12918/cas
Remember that in this specific configuration, the same hostname: datapp1node01.sci.example.com is used for the internal hostname (resolved locally through /etc/hosts) and the external hostname (resolved by the DNS) – this is the reason why our initial configuration was not working !
There are three approaches outlined below, which could be considered to useful in this situation. However the first approach Is generally preferred/recommended in many situations.
[cloud-user@intcas01 ~]$ sudo netstat -anp | grep cas
tcp 0 0 192.168.2.1:43806 0.0….0.0:* LISTEN 22893/cas
tcp 0 0 0.0.0.0:5570 0.0.0.0:* LISTEN 22893/cas
tcp 0 0 127.0.0.1:41222 0.0.0.0:* LISTEN 22893/cas
tcp 0 0 192.168.2.1:43806 192.168.2.2:35838 ESTABLISHED 22893/cas
tcp 0 0 192.168.2.1:43806 192.168.2.2:35840 ESTABLISHED 22893/cas
example for the first machine (datapp1) :
---
network_conf:
SAS_HOSTNAME: datapp1node01-priv.sci.example.com #[internal hostname]
SAS_BIND_ADDR: 192.168.10.152 #[internal network interface Infiniband]
SAS_SAN_DNS: " datapp1node01.sci.example.com localhost datapp1node01-priv.sci.example.com " #[certificate DNS field]
SAS_SAN_IP: "127.0.0.1 192.168.10.152 192.168.10.100 12.35.150.151" #[certificate IP field]
SAS_EXTERNAL_HOSTNAME: datapp1node01.sci.example.com #[external hostname]
SAS_EXTERNAL_BIND_ADDR_IF: "bondeth0 #[external network interface]
That way we could keep CAS listening on the external network interface and accept connections from CAS clients such as 3rd party programming interfaces.
In this second case study, the topology is pretty simple: 1 host machine with CAS and 1 host machine with the Viya services layer (microservices and Infrastructure servers) and SPRE.
In the lead up to the installation it was noted that each of the customer’s host machines had 2 NICs (internal and external).
The diagram below shows what the customer wanted to achieve:
Here is the table corresponding to this specific customer situation, showing the network interfaces, IP addresses and hostnames corresponding to the 2 Viya servers.
Network Interface |
Type |
Associated IP Address |
Associated hostnames |
eth1 |
10 GbE Internal network |
192.168.254.100 |
int2005.gs.example.com int2008.gs.example.com |
eth0 |
External Client access |
123.123.1.190 123.123.1.191 |
ext71.gs.example.com ext76.gs.example.com |
It’s recommended to document as much detail regarding the networking configuration as possible to avoid any misunderstanding.
From this information we prepared the network.conf files as below (first the first machine):
---
network_conf:
SAS_HOSTNAME: int2005.gs.example.com #[internal hostname]
SAS_BIND_ADDR_IF: eth1 #[internal network interface 10GB]
SAS_SAN_DNS: " int2005.gs.example.com ext71.gs.example.com localhost" #[internal hostname, external hostname, localhost]
SAS_SAN_IP: " 192.168.254.100 123.123.1.190 127.0.0.1" #[internal IP, external IP, 127.0.0.1]
SAS_EXTERNAL_HOSTNAME: ext71.gs.example.com #[external hostname]
SAS_EXTERNAL_BIND_ADDR_IF: eth0 #[external network interface 1GB]
Once again, the deployment was successful but there was an issue: the SASLogon microservice was failing showing errors message related to the TLS certificates :
Certificate for <...> doesn't match any of the subject alternative names [...]
After a bit of digging, and running some openssl commands, we discovered that on the machine where the Viya services and infrastructure servers (int2005) had been deployed, the Apache httpd had been installed with the certificates already configured.
But unfortunately, the certificates SAN (Subject Alternative Name) did NOT include the internal name (see below an extract).
Hostname :CN= ext71.gs.example.com
SAN: X509v3 Subject Alternative Name:
DNS: ext71.gs.example.com, DNS: www.ext71.gs.example.com
There was no reference to the internal int2005.gs.example.com name in the certificate.
Now the SASLogon error message was making sense…the microservices were not able to contact the apache HTTPD server using the internal hostname because the HTTPD TLS certificates was only valid for the external hostname.
So, at this point, there was two possible solutions:
Finally, after some discussions, the customer’s system administrator was able to add the internal hostname in the TLS certificate SAN.
After running the deployment, a second time with the updated TLS certificates in place, everything was working as expected in the Viya environment
While the new network configuration options provide a lot of flexibility, it’s important to note the caveats and provisos:
The installation ran fine, no errors, no warnings, nothing. But the Viya environment was not functioning as expected.
Even with these caveats, this entirely new paradigm for network configuration with the support of network interfaces and Classless Inter-Domain Routing (CIDR) range addresses, is likely to help in some scenarios deal with planned or unplanned changes in the IP addresses. At this time (March 2020) it is important to note that the Viya deployment guides will still refer to the requirement for static IP addresses.
As we’ve seen, many factors (multiple IP addresses and hostname resolution, HTTPD TLS certificates, etc…) might impact the way a multi-NIC Viya deployment should be prepared. Make sure you have a good understanding of them.
Hopefully the 2 case studies provided, give an idea of the challenges you may face and how the new networking configuration options can be used to navigate them successfully.
A “Big Thanks” to my colleagues at SAS who helped in their expertise and interactions to provide the content of this post.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.