We already know that the IaC (Infrastructure as Code) tool for Open-Source Kubernetes can be applied to a bunch of on-premise Linux machines and automatically create a Kubernetes cluster using the Upstream Opensource Kubernetes distribution. The Kubernetes cluster will have a control plane plus Kubernetes worker nodes.
But the IaC tool can also be used to provision other "on-prem" infrastructure components such as the Jump Server, the NFS Server or an external PostgreSQL Server.
In the first part of this series, we’ll focus on why and how to provision and an external PostgreSQL Server for a SAS® Viya deployment in Opensource Kubernetes using the IaC tool. As recommended by SAS® we will secure the connection between SAS® Viya and the external database using TLS encryption.
Then in the second part of the blog series, we'll discuss how to connect SAS® Viya to the external PostgreSQL server when we deploy with the viya4-deployment GitHub tool (aka DaC), then open up new perspectives and discuss some additional considerations (HA, Resources consumption impact, PGBouncer, etc…).
In the SAS® Viya platform, an instance of PostgreSQL is required for the SAS® Infrastructure Data Server (Platform PostgreSQL). Several SAS® Viya platform offerings also require a second Postgres instance referred to as "SAS® Common Data Store" (or CDS). The CDS cluster and the SAS Infrastructure Data Server cluster must be of the same type: internal or.... In other words, the two clusters must match.
But first let’s clarify what we mean by "Internal" and "External" PostgreSQL (which can be confusing terms).
As noted in the official documentation "Internal" is when you let the PostgreSQL instance be automatically deployed by SAS® (A containerized Postgres implementation is deployed inside the same Cluster’s namespace as SAS® Viya and managed by the Crunchy Data Operator ). “External” is when you provide your own PostgreSQL instance.
The decision between using an external versus internal PostgreSQL instance is one of the key Architecture decision to be taken during the Architecture design phase of the project.
While using internal Postgres, which is currently the OOTB (or default) choice, seems to be the choice of simplicity, several reasons could lead to chose the external PostgreSQL option:
One key benefit of running the PostgreSQL instance outside of the Kubernetes cluster is the removal of the version dependencies between PostgreSQL and Kubernetes, each of these components having their own release schedules.
So let’s see how to use an external PostgreSQL instance for a SAS® Viya deployment in open-source Kubernetes. 😊
Each IaC tool has an option to automate the provisioning of an external PostgreSQL System.
But while the IaC tool for the cloud providers (IaC for Azure, IaC for GCP, IaC for AWS) provision it as a Managed Cloud Service (Azure PostgreSQL Flexible Server, Cloud SQL for PostgreSQL, Amazon RDS for PostgreSQL), the IaC tool for Upstream Opensource Kubernetes simply installs the Ubuntu packages of the Open-Source PostgreSQL Server on a Stand-alone machine or a VM that you have to reference in your inventory file.
The variables used by the IaC Tool for Opensource Kubernetes to install and configure an external PostgreSQL Server are documented in the CONFIG-VARS.md file.
Here is a screenshot of the variables name and descriptions associated to the external PostgreSQL (for the "bareOS" mode in the viya4-iac-k8s version 3.0.0 GitHub project).
Notes :
Here is an example of the required section, that you need in your inventory file, if you want the IaC for K8s tool to provision the PostgreSQL server for you :
[test_k8s_default_pgsql]
rext03-0038.race.sas.com
[test_k8s_default_pgsql:vars]
postgres_server_name="default"
postgres_server_version="14"
postgres_server_ssl="on"
postgres_administrator_login="postgres"
postgres_administrator_password="admin123"
#postgres_server_ssl_cert_file=""
#postgres_server_ssl_key_file=""
postgres_system_setting_max_prepared_transactions="1024"
postgres_system_setting_max_connections="1024"
[postgres:children]
test_k8s_default_pgsql
Note that for some Viya offerings/solutions, you also need an extra PostgreSQL server instance for the SAS® Common Data Store (CDS).
You can also provision it (as an external instance) with the IaC tool. You would simply need to add a new host group, like [cds_k8s_pgsql] with the same associated variables in the [cds_k8s_pgsql:vars] section and finally you would reference it as a second item in the [postgres:children] host group.
If you deploy PostgreSQL on an Ubuntu machine/VM, there are some important system requirements for a successful deployment of the opensource PostgreSQL.
I’ve learned some useful lessons, sometimes the hard way 😉.
When I tested the PostgreSQL server provisioning with the IaC tool on Ubuntu 22.04, I noted two issues :
To understand the second issue, we need to look at the IaC code of the ansible tasks installing PostgreSQL.
In one of the very first task of the playbook, the apt install command initially installs PostgreSQL with SSL enabled, because it is the default even if you set the variable postgres_server_ssl to OFF in your inventory file.
Then the next task tries to update the postgres administrative user's password, but the assumption is that the PostgreSQL service is already running (with SSL) and is ready for connections at this point...
It was not our case. The PostgreSQL service was always failing with the same error message :
2023-05-16 16:25:44.885 UTC [26549] FATAL: could not load server certificate file "/etc/ssl/certs/ssl-cert-snakeoil.pem": SSL error code 214748366
The log message indicates that the service fails because the file "/etc/ssl/certs/ssl-cert-snakeoil.pem" could not be loaded.
The ssl-cert-snakeoil.pem certificate is the default system certificate file on Ubuntu and it appears to be required in order to have a successful startup of the PostgreSQL service (when installing it as a debian package with the apt install command). But unfortunately, it might not have been installed on your Ubuntu system!
The default system certificate can be installed as part of the ssl-certpackage, the default “snake-oil” certificate also needs to be generated and made readable for the account used to start the PostgreSQL service (“postgres” in my case).
Finally, we were able to get the PostgreSQL server successfully provisioned, when running the following commands (before running the IaC playbook):
# Update the sources and install some required debian packages on the sasnode01
sudo apt update
sudo apt install sysstat -y
wget http://mirrors.kernel.org/ubuntu/pool/main/l/llvm-toolchain-14/libllvm14_14.0.0-1ubuntu1_amd64.deb
sudo apt install ./libllvm14_14.0.0-1ubuntu1_amd64.deb# Install and Generate the default snakeoil certificate
sudo apt install ssl-cert -y
sudo make-ssl-cert generate-default-snakeoil
sudo ls -l /etc/ssl/certs/ssl-cert-snakeoil.pem
sudo chgrp ssl-cert /etc/ssl/certs/ssl-cert-snakeoil.pem
sudo chmod 640 /etc/ssl/certs/ssl-cert-snakeoil.pem# check
sudo ls -l /etc/ssl/certs/ssl-cert-snakeoil.pem
Of course, your mileage may vary and you may not hit this particular issue since every Ubuntu machine could have different setup...
But be aware that you may have to deal with this kind of issues and other system configuration. Being able to look into the IaC code to see exactly what is done is usually very useful to troubleshoot and resolve your issue.
After the deployment, you can see a PostgreSQL process running on the VM specified in the inventory file:
There is also a PostgreSQL service (run "sudo systemctl status postgresql" to check).
If you are curious, you might open the configuration file that appears in the process description above ("/etc/postgresql/14/main/postgresql.conf").
If you do so, you’ll probably notice something wrong !
Wait a minute…the parameters do not seem to reflect what we've asked for in our inventory file ! remember we had sepcified:
postgres_system_setting_max_prepared_transactions="1024"
postgres_system_setting_max_connections="1024"
Well…actually there is no problem in the IaC tool but there is a trick 😊:
The max_connections and max_prepared_transactions are set by the IaC code using the ALTER SYSTEM SQL command (here is a link to the ansible code).
According to the PostgreSQL doc : "ALTER SYSTEM is used for changing server configuration parameters across the entire database cluster. It can be more convenient than the traditional method of manually editing the postgresql.conf file. ALTER SYSTEM writes the given parameter setting to the postgresql.auto.conf file, which is read in addition to postgresql.conf."
Ok so that explains why we still had the initial values in the postgresql.conf file 😊.
But, really the best way to know the truth is to ask directly to the PostgreSQL Server.
Here is how you can do it: from the machine/VM where postgres is installed, run the psql command line tool to connect to your PostgreSQL server:
psql -h localhost -U postgres
Then you have to provide the password (the one that was specified in the inventory file).
You should then see something like that :
Now, you can run a command to look into the PostgreSQL settings and confirm the values for the "max_connections" and "max_prepared_connections" parameters.
SELECT Name,Setting from pg_settings WHERE name='max_prepared_transactions' OR name='max_connections';
Now, you can see that the number of maximum connections and maximum prepared transactions corresponds to what we wanted.
While we are there, we can have a look at the SSL parameters.
SELECT Name,Setting from pg_settings WHERE category='Connections and Authentication / SSL';
You should see something like that :
We’ll see, in the second part of the blog, that this information is useful when we need to configure SAS® Viya to use this external PostgreSQL Server instance for the SAS® Infrastructure Data Server.
I hope the information and this configuration first time experience feedback (shared in this first part of the series) was helpful.
Note that the SAS Global Enablement and Learning team is developing specific SAS® Viya 4 deployment hands-on content within workshop style content. Contact your local SAS Education team for more details if this is of interest to you.
In this blog we only looked at the provisioning of the PostgreSQL server.
But stay tuned for the second part of the series, where we go a little bit further: discuss what is needed to connect securely (with TLS) our SAS® Viya to the PostgreSQL server and discuss additional architecture considerations for the external PostgreSQL around HA, tuning and resource utilization.
Find more articles from SAS Global Enablement and Learning here.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.