BookmarkSubscribeRSS Feed

Configure an external PostgreSQL for SAS Viya in your Open-Source Kubernetes deployments – part 1

Started ‎10-12-2023 by
Modified ‎10-11-2023 by
Views 899

We already know that the IaC (Infrastructure as Code) tool for Open-Source Kubernetes can be applied to a bunch of on-premise Linux machines and automatically create a Kubernetes cluster using the Upstream Opensource Kubernetes distribution. The Kubernetes cluster will have a control plane plus Kubernetes worker nodes.

 

But the IaC tool can also be used to provision other "on-prem" infrastructure components such as the Jump Server, the NFS Server or an external PostgreSQL Server.

01_RP_IACDIAGRAM.png

 

In the first part of this  series, we’ll focus on why and how to provision and an external PostgreSQL Server for a SAS®  Viya deployment in Opensource Kubernetes using the IaC tool. As recommended by SAS® we will secure the connection between SAS®  Viya and the external database using TLS encryption.

 

Then in the second part of the blog series, we'll discuss how to connect SAS® Viya to the external PostgreSQL server when we deploy with the viya4-deployment GitHub tool (aka DaC), then open up new perspectives and discuss some additional considerations (HA, Resources consumption impact, PGBouncer, etc…).

 

Why using an External PostgreSQL ?

In the SAS®  Viya platform, an instance of PostgreSQL is required for the SAS® Infrastructure Data Server (Platform PostgreSQL). Several SAS®  Viya platform offerings also require a second Postgres instance referred to as "SAS®  Common Data Store" (or CDS). The CDS cluster and the SAS Infrastructure Data Server cluster must be of the same type: internal or.... In other words, the two clusters must match. 

But first let’s clarify what we mean by "Internal" and "External" PostgreSQL (which can be confusing terms).

 

As noted in the official documentation "Internal" is when you let the PostgreSQL instance be automatically deployed by SAS®  (A containerized Postgres implementation is deployed inside the same Cluster’s namespace as SAS®  Viya and managed by the Crunchy Data Operator ). “External” is when you provide your own PostgreSQL instance.

 

The decision between using an external versus internal PostgreSQL instance is one of the key Architecture decision to be taken during the Architecture design phase of the project.

While using internal Postgres, which is currently the OOTB (or default) choice, seems to be the choice of simplicity, several reasons could lead to chose the external PostgreSQL option:

  • Lack of full support for newer versions of Kubernetes (while SAS®  Viya stable 2023.08 is supported in Kubernetes 1.27, SAS®  recommends using an external instance of PostgreSQL if you want to deploy with Kubernetes 1.27)
  • Automated tuning of some postgres parameters by Cloud providers (Azure, AWS)
  • Advanced PostgreSQL monitoring and log analysis with Cloud Providers services or PostgreSQL community tools 
  • Enhanced backup, disaster recovery features (like automated snapshot backups in Azure, Barman, pgBackRest and other tools designed by the PostgreSQL community to provide data protection
  • etc...

One key benefit of running the PostgreSQL instance outside of the Kubernetes cluster  is the removal of the version dependencies between PostgreSQL and Kubernetes, each of these components having their own release schedules.

 

So let’s see how to use an external PostgreSQL instance for a SAS® Viya deployment in open-source Kubernetes.  😊

 

Provision the external PostgreSQL with the IaC tool

 

PostgreSQL installation variables

 

Each IaC tool has an option to automate the provisioning of an external PostgreSQL System.

 

But while the IaC tool for the cloud providers (IaC for Azure, IaC for GCP, IaC for AWS) provision it as a Managed Cloud Service (Azure PostgreSQL Flexible Server, Cloud SQL for PostgreSQL, Amazon RDS for PostgreSQL), the IaC tool for Upstream Opensource Kubernetes simply installs the Ubuntu packages of the Open-Source PostgreSQL Server on a Stand-alone machine or a VM that you have to reference in your inventory file.

 

The variables used by the IaC Tool for Opensource Kubernetes to install and configure an external PostgreSQL Server are documented in the CONFIG-VARS.md file.

 

Here is a screenshot of the variables name and descriptions associated to the external PostgreSQL (for the "bareOS" mode in the viya4-iac-k8s version 3.0.0 GitHub project).

 

01a_RP_IaCVars.png

 

Notes :

 

  • The postgres_ip is not required, you can use the postgres_server_name value instead and use a FQDN.
  • It is highly recommended to secure your communications with the PostgreSQL Server through TLS encryption (postgres_server_ssl="on"). You can set your own SSL certificate and SSL key file with the postgres_server_ssl_cert_file and postgres_server_ssl_key_file variables, but if you don’t specify them, the system default certificate is used by the IaC tool.

 

Example

 

Here is an example of the required section, that you need in your inventory file, if you want the IaC for K8s tool to provision the PostgreSQL server for you :

 

[test_k8s_default_pgsql]
rext03-0038.race.sas.com
[test_k8s_default_pgsql:vars]
postgres_server_name="default"
postgres_server_version="14"
postgres_server_ssl="on"
postgres_administrator_login="postgres"
postgres_administrator_password="admin123"
#postgres_server_ssl_cert_file=""
#postgres_server_ssl_key_file=""
postgres_system_setting_max_prepared_transactions="1024"
postgres_system_setting_max_connections="1024"
[postgres:children]
test_k8s_default_pgsql

 

Note that for some Viya offerings/solutions, you also need an extra PostgreSQL server instance for the SAS® Common Data Store (CDS).

You can also provision it (as an external instance) with the IaC tool. You would simply need to add a new host group, like [cds_k8s_pgsql] with the same associated variables in the [cds_k8s_pgsql:vars] section and finally you would reference it as a second item in the [postgres:children] host group.

 

Challenges and requirements

 

If you deploy PostgreSQL on an Ubuntu machine/VM, there are some important system requirements for a successful deployment of the opensource PostgreSQL.

 

I’ve learned some useful lessons, sometimes the  hard way 😉.

 

When I tested the PostgreSQL server provisioning with the IaC tool on Ubuntu 22.04,  I noted two issues :

 

  • The first one was a failure of the apt install command because of missing packages and dependencies issues. We had to install two additional apt packages (and one of them directly from the .deb file).
  • Then after fixing this first problem, while the installation of the PostgreSQL package by the playbook seemed successful, the next Ansible task of the IaC playbook ("update postgres admin user's password") was always failing. ☹️

 

To understand the second issue, we need to look at the IaC code of the ansible tasks installing PostgreSQL.

 

In one of the very first task of the playbook, the apt install command initially installs PostgreSQL with SSL enabled, because it is the default even if you set the variable postgres_server_ssl to OFF in your inventory file.

 

Then the next task tries to update the postgres administrative user's password, but the assumption is that the PostgreSQL service is already running (with SSL) and is ready for connections at this point...

 

It was not our case. The PostgreSQL service was always failing with the same error message :

 

2023-05-16 16:25:44.885 UTC [26549] FATAL:  could not load server certificate file "/etc/ssl/certs/ssl-cert-snakeoil.pem": SSL error code 214748366

 

The log message indicates that the service fails because the file "/etc/ssl/certs/ssl-cert-snakeoil.pem" could not be loaded.

 

The ssl-cert-snakeoil.pem certificate is the default system certificate file on Ubuntu and it appears to be required in order to have a successful startup of the PostgreSQL service (when installing it as a debian package with the apt install command). But unfortunately, it might not have been installed on your Ubuntu system!

 

The default system certificate can be installed as part of the ssl-certpackage, the default “snake-oil” certificate also needs to be generated and made readable for the account used to start the PostgreSQL service (“postgres” in my case).

 

Finally, we were able to get the PostgreSQL server successfully provisioned, when running the following commands (before running the IaC playbook):

 

# Update the sources and install some required debian packages on the sasnode01
sudo apt update
sudo apt install sysstat -y
wget http://mirrors.kernel.org/ubuntu/pool/main/l/llvm-toolchain-14/libllvm14_14.0.0-1ubuntu1_amd64.deb
sudo apt install ./libllvm14_14.0.0-1ubuntu1_amd64.deb# Install and Generate the default snakeoil certificate
sudo apt install ssl-cert -y
sudo make-ssl-cert generate-default-snakeoil
sudo ls -l /etc/ssl/certs/ssl-cert-snakeoil.pem
sudo chgrp ssl-cert /etc/ssl/certs/ssl-cert-snakeoil.pem
sudo chmod 640 /etc/ssl/certs/ssl-cert-snakeoil.pem# check
sudo ls -l /etc/ssl/certs/ssl-cert-snakeoil.pem

 

Of course, your mileage may vary and you may not hit this particular issue since every Ubuntu machine could have different setup...

But be aware that you may have to deal with this kind of issues and other system configuration. Being able to look into the IaC code to see exactly what is done is usually very useful to troubleshoot and resolve your issue.

 

Check the PostgreSQL server system settings

 

After the deployment, you can see a PostgreSQL process running on the VM specified in the inventory file:

 

02_RP_postgresql-service-process-1024x100.png

 

There is also a PostgreSQL service (run "sudo systemctl status postgresql" to check).

 

If you are curious, you might open the configuration file that appears in the process description above ("/etc/postgresql/14/main/postgresql.conf").

 

If you do so, you’ll probably notice something wrong !

 

03_RP_pg-config-1024x652.png

 

 

Wait a minute…the parameters do not seem to reflect what we've asked for in our inventory file ! remember we had sepcified:

 

postgres_system_setting_max_prepared_transactions="1024"
postgres_system_setting_max_connections="1024"

 

Well…actually there is no problem in the IaC tool but there is a trick 😊:

 

The max_connections and max_prepared_transactions are set by the IaC code using the ALTER SYSTEM SQL command (here is a link to the ansible code).

 

According to the PostgreSQL doc  : "ALTER SYSTEM is used for changing server configuration parameters across the entire database cluster. It can be more convenient than the traditional method of manually editing the postgresql.conf file. ALTER SYSTEM writes the given parameter setting to the postgresql.auto.conf file, which is read in addition to postgresql.conf."

 

Ok so that explains why we still had the initial values in the postgresql.conf file 😊.

But, really the best way to know the truth is to ask directly to the PostgreSQL Server.

Here is how you can do it: from the machine/VM where postgres is installed, run the psql command line tool to connect to your PostgreSQL server:

 

psql -h localhost -U postgres

 

Then you have to provide the password (the one that was specified in the inventory file).

 

You should then see something like that :

 

05_RP_psql-prompt.png

 

Now, you can run a command to look into the PostgreSQL settings and confirm the values for the "max_connections" and "max_prepared_connections" parameters.

 

SELECT Name,Setting from pg_settings WHERE name='max_prepared_transactions' OR name='max_connections';

 

Now, you can see that the number of maximum connections and maximum prepared transactions corresponds to what we wanted.

 

06_RP_max-connections-result.png

 

While we are there, we can have a look at the SSL parameters.

 

SELECT Name,Setting from pg_settings WHERE category='Connections and Authentication / SSL';

 

You should see something like that :

 

07_RP_sslcertfile-1024x499.png

 

We’ll see, in the second part of the blog, that this information is useful when we need to configure SAS®  Viya to use this external PostgreSQL Server instance for the SAS®  Infrastructure  Data Server.

Conclusion

 

I hope the information and this configuration first time experience feedback (shared in this first part of the series) was helpful.

Note that the SAS Global Enablement and Learning team is developing specific SAS® Viya 4 deployment hands-on content within workshop style content. Contact your local SAS Education team for more details if this is of interest to you.

 

In this blog we only looked at the provisioning of the PostgreSQL server.

But stay tuned for the second part of the series, where we go a little bit further: discuss what is needed to connect securely (with TLS) our SAS® Viya to the PostgreSQL server and discuss additional architecture considerations for the external PostgreSQL around HA, tuning and resource utilization.

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎10-11-2023 04:34 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags