Restore a Viya Backup with SAS Viya Backup and Database Point in Time Restore

1 Like

Traditionally SAS Viya has relied on internal processes to perform backup and restore. A recent change has added support for using third-party functionality to backup the SAS Infrastructure Data Server portion of the Viya deployment. In this post I will introduce this new feature and demonstrate how you can use it, using a SAS Viya deployment on Azure Kubernetes Service (AKS).

It is important to note that this is an addition of functionality, not a replacement. Instead of a single workflow that uses SAS Viya application backup tools to perform a backup and restore, there are now two workflows available:

SAS Viya Application Backup Only
SAS Viya Application Backup and Cloud-Provider for external PostgreSQL

To use the second workflow the PostgreSQL provider must support Point-In-Time-Recovery (PITR) of the external PostgreSQL server to a custom restore point based on date and time. Many cloud-provider databases do provide this functionality. For example, the Azure Doc states:

“Azure Database for PostgreSQL - Flexible Server automatically performs regular backups of your server. You can then do a point-in-time recovery (PITR) within a retention period that you specify.”

Check out this page for more details on Azure Database for PostgreSQL - Flexible Server

The benefit of using PITR is that it can be significantly faster to perform a restore than the traditional SAS Viya Backup which uses pg_dump and pg_restore The traditional approach can be very slow when there is a large volume of data in the PostgreSQL database.

Let's look at how to backup and restore using SAS Viya application backup and Azure Database for PostgreSQL - Flexible Server PITR.

Backup

To rely on the Azure Flexible PostgreSQL server point-in-time restore functionality for the SAS Infrastructure Server backup we will set a parameter that will exclude the Infrastructure Data Server from the SAS Viya Application backup. To exclude the SAS Infrastructure Data Server from the backup add the parameter INCLUDE_POSTGRES='false' to the sas-backup-job-parameters configmap. Add the following code to the configMapGenerator section of kustomization.yaml and perform a build and apply.

configMapGenerator:  
[... previous resource items ...]
	- name: sas-backup-job-parameters
	behavior: merge
	literals:
	   - INCLUDE_POSTGRES="false"
[...]

With the backup configMap updated, we can now run an ad-hoc backup. In this step, we start the backup job from the scheduled backup cronjob and give it the name sas-scheduled-backup-job-adhoc-001.

kubectl create job --from=cronjob/sas-scheduled-backup-job sas-scheduled-backup-job-adhoc-001

Follow the log of the backup job until it is completed.

kubectl logs -f job/sas-scheduled-backup-job-adhoc-001 -c sas-backup-job | gel_log

Check the job log to make sure the backup ran successfully.

Select any image to see a larger version.

Mobile users: To view the images, select the "Full" version at the bottom of the page.

NOTE: in the previous command we pipe to a custom function(gel_log) that uses JQ to reformat the log from JSON to a more human-readable format.

If we review the backup package in sas-common-backup-data PVC there will be no /sas-common-backup-data/${backupid}/default/postgres directory because the SAS Viya Infrastructure data server PostgreSQL database was not included in the backup. The diagram below shows the overall process and result.

Restore

Prepare for Restore

The first step in the restore is to get the backup ID of the backup to restore. Each backup is identified using a timestamp value called the backup ID. When performing a restore that includes PITR we use the timestamp value to identify :

The SAS Viya backup package AND
Restore "point-in-time" within the PosgtGresSQL backup.

In this step, retrieve the backup ID from the job that created the backup.

backupid=$(yq4 eval '(.metadata.labels."sas.com/sas-backup-id")' <(kubectl get job sas-scheduled-backup-job-adhoc-001 -o yaml))
echo Backup Id is: $backupid

Output:

Backup Id is: 20240104-150710

Recently the format of the SAS Viya backup ID changed. The new format of the backup ID is YYYYMMDD-HHMMSSF (the F can be ignored as it signifies that this is a full backup) In YYYYMMDD, YYYY is an integer that represents the year, MM is an integer that represents the month of the year, and DD is an integer that represents the day of the month. In HHMMSS, HH is an integer representing the hour, MM is an integer representing the minute, and SS is an integer representing the second. All time is UTC.

NOTE: the new backup ID format is backward compatible with the old format.

The restore point format for Azure Postgres PITR is YYYY-MM-DDTHH:MM: SSZ eg. "2021-05-05T13:10:00Z". You can use this command to convert the backup ID to the format of the restore point.

restorePoint=${backupid:0:4}-${backupid:4:2}-${backupid:6:2}T${backupid:9:2}:${backupid:11:2}:${backupid:13:2}Z:${backupid:12:2}Z
echo Restore Point is: $restorePoint

Output:

Restore Point is: 2024-01-04T15:07:10Z:71Z

An important feature of PITR is that the process does not restore to the same PostgreSQL server instance. To restore you need to provide a new instance name, which should be unique. It's a good practice to have a naming convention for your instances as you may end up with multiple instances in a cluster. For our purposes, we will just prepend the backup ID to the existing Flexible PostgreSQL server name eg. ${backupid}-blitz-0331-rg-default-flexpsql. In this step, we create an environment variable with the new name.

RESTORE_DB_NAME=d${backupid,,}-${GELENABLE_PREFIX}-default-flexpsql
echo Restored DB instance: ${RESTORE_DB_NAME}

Output:

Restored DB instance:  d20240104-150710f-boxer-p02095-default-flexpsql

Perform Point-in-Time Restore

The restore can be performed in the Azure Portal or using the Azure CLI. In the Azure Portal select the Resource Group of your Viya Deployment. Under the listed resources, select the Azure PostGresSQL Flexible server. Within the flexible server under Settings on the right, select Backup and Restore. Follow these steps to restore:

Step 1. Select the Fast Restore option for the Automated Backup with the Completion timestamp before, but closest to the restore point.

Step 2. Select the radio button Select a custom restore point and provide.

The date and time of the restore point that matches the backup ID.
Give the new database instance a unique name.

Select Review and Create and then Create.

The deployment of the new instance will start with the message "Deployment in Progress". When it is completed the UI will output the message "Your deployment is complete". This will complete the restoration of the Viya Infrastructure data server. It is restored to the new instance name and in Azure you now have two instances, the Viya environment still points to the original instance. In the following steps of the restore, we will switch Viya to point to the restored instance.

Restore the SAS Viya Backup

When using PITR for the PostGresSQL database you must select a backup package that does not include the SAS Viya Infrastructure Data Server data.

The restore process happens in two steps. These steps are summarized here and covered in more detail in my previous post.

Step 1 The Restore Job

Restores the SAS Configuration Server
Stop the CAS Server(s)

Step 2 Clear the CAS PV's and restart CAS in RESTORE mode

Delete existing data from the CAS PV's
Restores CAS data and content

Run the Restore Job

Firstly, identify the sas-restore-job-parameters configMap that needs to be modified. This step returns the config map for the restore job.

restore_config_map=$(kubectl describe cronjob sas-restore-job | grep -i sas-restore-job-parameters | awk '{print $1}'|head -n 1)
echo The current restore Config Map is: $restore_config_map

Output:

The current restore Config Map is: sas-restore-job-parameters-hgd4ftbmmm

Edit the configMap to set the restore parameters.

SAS_BACKUP_ID to the backup ID of the package to restore.
SAS_DEPLOYMENT_START_MODE to RESTORE.
AUTO_SWITCH_POSTGRES_HOST to True.
DATASERVER_HOST_MAP to map the name of the restored PostGresSQL Server.

We have stored these values in environment variables in prior steps. This code updates the restore configMap with the relevent values and checks the values have been updated.

RESTORE_DB_HOST=${RESTORE_DB_NAME}.postgres.database.azure.com
kubectl patch cm $restore_config_map --type json -p '[ {"op": "replace", "path": "/data/SAS_BACKUP_ID", "value":"'${backupid}'"},{"op": "replace", "path": "/data/SAS_LOG_LEVEL", "value":"DEBUG"},{"op": "replace", "path": "/data/SAS_DEPLOYMENT_START_MODE", "value":"RESTORE"},{"op": "replace", "path": "/data/AUTO_SWITCH_POSTGRES_HOST", "value":"True" },{"op": "replace", "path": "/data/DATASERVER_HOST_MAP", "value":"sas-platform-postgres:'${RESTORE_DB_HOST}'"}]'
kubectl describe cm $restore_config_map

Start the Restore Job from the Restore cronjob. This process will restore the SAS Configuration Server. In addition, it will stop the CAS server to prepare for restore of the CAS server.

kubectl create job --from=cronjob/sas-restore-job sas-restore-job

After giving the job some time to run you can check for specific messages in the log to check the status.

kubectl logs -l "job-name=sas-restore-job" -c sas-restore-job --tail 1000 | klog | grep "restore job completed successfully" -B 3 -A 10

IMAGE

Restore the CAS Server

With the restore job completed the second step will restore the CAS server. This process is covered at a high level here but is covered in more detail in this post.

The restore job has stopped all CAS Servers in the environment. To restore CAS, the CAS Server will be started in RESTORE mode and data and configuration will be restored during server startup. Firstly, let’s check that CAS is not running.

kubectl get pods --selector="app.kubernetes.io/managed-by==sas-cas-operator"

Output:

No Resources in yournamespace namespace.

The process uses two provided CAS scripts.

sas-backup-pv-copy-cleanup.sh deletes the existing data from the CAS PV's.
scale-up-cas.sh starts the CAS server(s) in RESTORE mode.

Make the provided CAS scripts executable.

Restoring the data to the two CAS file PVCs requires a clean volume. Run the sas-backup-pv-copy-cleanup.sh script to clean up the CAS PVCs. This step deletes the existing data on the CAS permstore(cas-default-permstore) and CAS data(cas-default-data) PVCs.

~/project/deploy/${current_namespace}/sas-bases/examples/restore/scripts/sas-backup-pv-copy-cleanup.sh gelcorp remove "default"

Startup the CAS Server

~/project/deploy/${current_namespace}/sas-bases/examples/restore/scripts/scale-up-cas.sh gelcorp "default"

When the CAS server starts it checks the Restore Job configMap attribute SAS_DEPLOYMENT_START_MODE. If it is set to RESTORE, the CAS server will start and restore the data from the directory that matches the BACKUP_ID in the sas-cas-backup-data PVC.

Check the log to see if the CAS server performed the restore. The logs should show the start of the restore process that restores the backup content to the target CAS persistent volumes.

kubectl logs sas-cas-server-default-controller -c sas-cas-server  | gel_log | grep -A 10 "RESTORE"

Finalize the Restore

To complete the switch over to the newly restored Postgres instance, restart all the services with the sas-stop-all and sas-start-all cronjobs. This code runs the stop job and waits for all the PODS on the namespace to stop before running the start and job and then waiting for Viya to start.

kubectl create job --from cronjobs/sas-stop-all restart-stop
RUNNING_PODS=$(kubectl get pods  --field-selector=status.phase=Running  -o go-template='{{len .items}}')
until [ $RUNNING_PODS -lt 2 ]; do   echo Running Pods: $RUNNING_PODS;   RUNNING_PODS=$(kubectl get pods  --field-selector=status.phase=Running  -o go-template='{{len .items}}'); done
kubectl create job --from cronjobs/sas-start-all restart-start

To ensure that SAS Viya 4 is referencing the restored PostgreSQL instance, execute the following command for each PostgresSQL instance in the cluster. The command should show the address and services as the new name that you gave to the PostgresSQL database.

HTTP_PROTOCOL=https 
CONSUL_SVC_NAME=sas-postgres
kubectl exec -it sas-consul-server-0 -c sas-consul-server -- bash -c "export CONSUL_HTTP_ADDR=$HTTP_PROTOCOL://localhost:8500; /opt/sas/viya/home/bin/sas-bootstrap-config catalog service $CONSUL_SVC_NAME"

{
    "items": [
        {
            "address": "d20240115-134651f-blitz-p02095-default-flexpsql.postgres.database.azure.com",
            "createIndex": 692506,
            "datacenter": "viya",
            "modifyIndex": 692506,
            "node": "sas-postgres",
            "nodeMeta": null,
            "serviceAddress": "d0240115-134651f-blitz-p02095-default-flexpsql.postgres.database.azure.com",
            "serviceEnableTagOverride": false,
........

Finally, we will reset the SAS restore job configMap parameters.

kubectl patch cm $restore_config_map --type json -p '[{ "op": "remove", "path": "/data/SAS_BACKUP_ID" },{"op": "remove", "path": "/data/SAS_DEPLOYMENT_START_MODE"} ,{"op": "remove", "path": "/data/AUTO_SWITCH_POSTGRES_HOST" },{"op": "remove", "path": "/data/DATASERVER_HOST_MAP"} ]'

The Viya manifests point to the PostgreSQL server used before the restore. An important final step ensures that when any future changes are made Viya continues to point to the restored server. To ensure that Viya continues to point to the restored server update the manifests to point to the new host for the PostgreSQL server. The file to be updated is usually at $deploy/site-config/postgres /{ POSTGRES-SERVER-NAME }-dataserver-transformer.yaml. Here is an example of an updated file.

apiVersion: builtin
kind: PatchTransformer
metadata:
  name: sas-platform-postgres-dataserver-transformer
patch: |-
  - op: replace
    path: /spec/users/0/credentials/input
    value:
      secretRef:
        name: platform-db-user
      usernameKey: username # For internal use, do not modify
      passwordKey: password # For internal use, do not modify

  - op: replace
    path: /spec/registrations/0/host
    value: d20240115-134651f-boxer-p02095-default-flexpsql

  - op: replace
    path: /spec/registrations/0/port
    value: 5432

  - op: replace
    path: /spec/ssl
    value: true
target:
  group: webinfdsvr.sas.com
  kind: DataServer
  version: v1beta1
  name: sas-platform-postgres

Considerations

The Azure Flexible server for Postgres has a default retention period of 7 days. The retention period for the Postgres should be changed so that it is in synch with the SAS Viya backups. SAS Viya backups have a retention period of 30 days by default. Matching retention periods will ensure that if we need to restore a backup the required PostgreSQL and SAS data is available.

The restore process created a new PostGresSQL instance. You may need to keep the old and new instances around for a while to provide the ability to do a PITR during the 30-day retention period.

Wrap-Up

The addition of support for using third-party functionality to backup and restore the SAS Viya Infrastructure Data Server portion of the Viya deployment is a welcome update. This support depends on the database supporting Point-in-time restore. This addition adds flexibility to the SAS Viya Backup and Restore functionality and will be particularly useful when the volume of data stored in the Infrastructure Data Server is large. The SAS Viya Administration Backup and Restore documentation has been updated to cover the new workflow.

Find more articles from SAS Global Enablement and Learning here.