SAS Viya: Restore a Backup in 2023.10 and later

2 Likes

SAS Viya stable 2023.10 introduces a simplified approach to restoring a Viya backup. In prior releases, an administrator was required to build and apply temporary manifests to initiate a restore. In 2023.10 (and forward) this is no longer the case. In this post, I will review the new and improved functionality.

Viya Backup and Restore Review

In a previous post, I provided an overview of SAS Viya 4 Backup and Restore. To summarize, a Viya backup includes:

Content stored in the Infrastructure Data Server (reports, folders, rules, etc.)
Configuration stored in the SAS Configuration Server (application configuration settings etc.)
CAS permstore (Caslib definitions, access controls, etc.)
Default CAS persistent volume content (contents of the Public caslib etc.)

Viya backup and restore are implemented using native Kubernetes functionality. Backup and restore are implemented as Kubernetes jobs and cronJobs. The restore process has two high-level steps.

Step 1: The Restore Job restores the SAS Infrastructure Data Server and Configuration Server
Step 2: The CAS Server starts in RESTORE mode to restore CAS configuration and data.

What has Changed?

So what has changed? In 2023.10 the restore process performs the same two steps, however, the method of initiating a restore has improved. Until 2023.10 the restore process involved building and applying temporary Kubernetes manifests to perform the two steps; running the Restore Job and restarting CAS in RESTORE mode. After restoring the backup was complete the manifests that manage the deployment had to be reset to their original state and the temporary manifests discarded. The restore process was not particularly easy for the Viya administrator. The new process does not require the creation of temporary manifests. Let's see how it works now.

Step 1 Running the Restore Job

Each individual backup is identified using a timestamp value called the backup ID. In the command below, we retrieve the backup ID of the backup to restore from the ad-hoc backup job that created the backup.

backupid=$(yq4 eval '(.metadata.labels."sas.com/sas-backup-id")' <(kubectl get job sas-scheduled-backup-job-adhoc-001 -o yaml))
echo ${backupid}

Output:

2023-11-09T15_32_28_628_0700

When selecting a backup to restore you should always check the status.json in the sas-common-backup-data PVC directory to be sure that the backup you are restoring was completed successfully. The status.json contains detailed information about the backup. For a successful backup, the file should have the value: sas.com/sas-backup-job-status: Completed.

The sas-restore-job-parameters configMap is how we pass settings to the restore process. The configMap will be referenced by the restore job and by the CAS Server when it starts. The two parameters to set for a restore are:

SAS_BACKUP_ID: the backup ID of the backup to restore. For example, 2023-11-09T15_32_28_628_0700
SAS_DEPLOYMENT_START_MODE: RESTORE

The patch command patches the configMap updating the values of SAS_BACKUP_ID and SAS_DEPLOYMENT_START_MODE:

restore_config_map=$(kubectl describe cronjob sas-restore-job | grep -i sas-restore-job-parameters | awk '{print $1}'|head -n 1)
echo The current restore Config Map is: $restore_config_map
kubectl patch cm $restore_config_map --type json -p '[ {"op": "replace", "path": "/data/SAS_BACKUP_ID", "value":"'${backupid}'"}, {"op": "replace", "path": "/data/SAS_DEPLOYMENT_START_MODE", "value":"RESTORE" }]'

Output:

The current restore Config Map is: sas-restore-job-parameters-bm48bd82bg
configmap/sas-restore-job-parameters-bm48bd82bg patched

Using the following command we can view the updated configMap and make sure that the SAS_BACKUP_ID and SAS_DEPLOYMENT_START_MODE parameters are correctly set.

kubectl describe cm $restore_config_map

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

With the parameters set correctly, we can now start the Restore Job from the Restore cronJob. This process will restore the SAS Infrastructure Data Server and the SAS Configuration Server. In addition, it will stop the CAS server which is a pre-requisite for the second step which will restore the CAS server.

kubectl create job --from=cronjob/sas-restore-job sas-restore-job

Output:

job.batch/sas-restore-job created

You can view the job log as it runs with this command, using the -f parameter to stream the log to the screen.

kubectl logs -l "job-name=sas-restore-job" -f -c sas-restore-job | gel_log

Note: in the previous command we pipe to a custom function(gel_log) to reformat the log from JSON to a more human-readable format. The function is shown below for your information.

gel_log () {
jq -R -r '. as $line | try (fromjson| "\(.level | ascii_upcase) \(.timeStamp) '['\(.source)']-' \(.message) " ) catch $line '
}

You can check the status of a restore job with this command. The status should eventually change from "Running" to "Completed".

kubectl get jobs -l "sas.com/backup-job-type=restore" -L "sas.com/sas-backup-id,sas.com/backup-job-type,sas.com/sas-restore-status"

Output:

NAME              COMPLETIONS   DURATION   AGE   SAS-BACKUP-ID                  BACKUP-JOB-TYPE   SAS-RESTORE-STATUS
sas-restore-job   0/1           42s        42s   2023-11-16T19_48_28_628_0700   restore           Running

To make sure the job has run successfully check the log for the message "restore job completed successfully."

 kubectl logs -l "job-name=sas-restore-job" -c sas-restore-job --tail 1000 | gel_log | grep "restore job completed successfully" -B 3 -A 1

The restore job will perform a rolling start of many of the SAS Viya Pods. Before moving on to the next step we should check that two of the key PODS (SAS logon and Configuration) are up and running.

kubectl get pods -l app=sas-logon-app
kubectl get pods -l app=sas-configuration

Output:

[cloud-user@pdcesx11133 from35]$ kubectl get pods -l app=sas-logon-app
NAME                             READY   STATUS    RESTARTS   AGE
sas-logon-app-5dcb9df44d-hb8gd   1/1     Running   0          2m26s
[cloud-user@pdcesx11133 from35]$ kubectl get pods -l app=sas-configuration
NAME                                 READY   STATUS    RESTARTS   AGE
sas-configuration-68b558b8b9-g7dg4   1/1     Running   0          2m46s

Step 2 Restore the CAS Server

With the restore job completed the second step will restore the CAS server. The restore job has stopped all CAS Servers in the environment. To restore CAS, the CAS Server will be started in RESTORE mode and data and configuration will be restored during server startup. Firstly, let's check that CAS is not running.

kubectl get pods --selector="casoperator.sas.com/server==default" -n gelcorp

Expected Output:

No resources found in gelcorp namespace.

To replace the old manifest approach two new scripts are now used to initiate the CAS restore: The scripts are delivered with the deployment assets in the directory sas-bases/examples/restore/scripts. The two scripts are:

sas-backup-pv-copy-cleanup.sh deletes the existing data from the CAS persistent volumes.
scale-up-cas.sh starts the CAS server(s).

The scripts are delivered with the deployment assets in the directory sas-bases/examples/restore/scripts. To run the scripts, we need to make them executable.

chmod +x ~/project/deploy/${current_namespace}/sas-bases/examples/restore/scripts/*.sh

The restoration of the data to the two CAS file PVCs requires a clean volume. Run the sas-backup-pv-copy-cleanup.sh script to clean up the CAS PVCs. This step deletes the existing data on the CAS permstore(cas-default-permstore) and CAS data(cas-default-data) PVCs. The parameters of the script are:

namespace
operation
tenant list (default for non-multi-tenant)

cd ~/project/deploy/${current_namespace}/sas-bases/examples/restore/scripts/
./sas-backup-pv-copy-cleanup.sh gelcorp remove "default"

Output:

The cleanup pods are created, and they are in a running state.
Ensure that all pods are completed. To check the status of the cleanup pods, run the following command.
kubectl -n gelcorp get pods -l sas.com/backup-job-type=sas-backup-pv-copy-cleanup | grep 21bef2c

The script creates a Kubernetes Job that clears the key data from the CAS PVCs so that it can be restored from the backup package. Using the command provided in the script output we can view the status of the job and for more details we can view the log of the job.

kubectl -n gelcorp get pods -l sas.com/backup-job-type=sas-backup-pv-copy-cleanup | grep 21bef2c
kubectl -n gelcorp logs  -l sas.com/backup-job-type=sas-backup-pv-copy-cleanup

With the CAS PVCs successfully cleaned, we can start up the CAS server(s) using scale-up-cas.sh. The parameters of the script are:

namespace
tenant list (default for non-multi-tenant)

cd ~/project/deploy/${current_namespace}/sas-bases/examples/restore/scripts/
./scale-up-cas.sh gelcorp "default"

casdeployment.viya.sas.com/default patched

When the CAS server starts it checks the Restore Job configMap attribute SAS_DEPLOYMENT_START_MODE. If it is set to RESTORE, the CAS server will start and restore the data from the directory that matches the BACKUP_ID in the sas-cas-backup-data PVC.

You can check the CAS Server log to see if the restore was performed. The logs will show the start of the restore process and details of the restore of the backup content from the backup package to the target CAS persistent volumes.

kubectl logs sas-cas-server-default-controller -c sas-cas-server  | grep -A 10 "RESTORE"

An important final step is to reset all SAS restore job configMap parameters. If we don't perform this step the CAS server will attempt to restore the backup from the package on every restart.

kubectl patch cm $restore_config_map --type json -p '[{ "op": "remove", "path": "/data/SAS_BACKUP_ID" },{"op": "remove", "path": "/data/SAS_DEPLOYMENT_START_MODE"}]'

Wrap Up

Restore of a SAS Viya backup is now initiated using kubectl commands and scripts. The new and improved restore process is currently supported for Backup and Restore and Viya 4 to Viya 4 Migration. Currently, the old method of building and applying manifests is also supported. There are plans to add support for the new method for Viya 3. x to 4 Migration. I hope you found this useful. In the Backup and Restore area look for more exciting updates and related blog posts in the coming months.

The new restore process is documented:

Find more articles from SAS Global Enablement and Learning here.

HaraldSeifert · ‎12-12-2023

Instructive & useful scenario