SAS SpeedyStore – are you taking backups of your Singlestore database?

Introduction

Taking backups and making sure that these backups are stored securely is an important aspect of taking good care of your applications. This doesn’t seem like rocket science but having a proper backup strategy in place is not as easy as it seems.

Here are some examples of things that may go wrong

Storing a backup of an application on the same server where the application is running.
Not having tested the backup of an application by going through a restore of said application.
It’s running on the cloud, backups are taken care of automatically, right?

As SAS administrators you are familiar with how to create backups and how to restore the backups of the SAS Viya platform. And of course, you have a proper backup strategy in place to make sure the SAS Viya platform will be running smoothly. But with the addition of Singlestore to the SAS Viya platform there is another component that requires your attention with regards to backups as a SAS Administrator.

Are you already making backups of Singlestore? In this blog I’d like to have a closer look at the options that are available for making a backup of Singlestore databases and discuss some of the pros and cons that come with these options. And I will end by providing an example of how to configure SAS SpeedyStore to take a backup on Azure Blob Storage.

Local or unlimited database

Singlestore makes a distinction between two different types of databases:

Local database: data within a database is kept on disk.
Unlimited storage: data within the database is backed by object storage and only data that is accessed frequently is kept on disk.

It’s important to understand the distinction between these two types of databases. Because from a backup and restore perspective these are treated differently.

Unlimited storage

An unlimited storage database allows you to use point in time recovery to restore your unlimited storage database to a specific point in time. It doesn’t use any backups for that. It relies upon the unlimited storage feature in Singlestore and leverages the blobs stored in the object storage. More information on this topic can be found here.

Local databases

A local database can be backed up and restored using the backup and recovery procedures available within Singlestore. Creating a backup of a database is something that needs to be configured and is not set up out of the box. Similarly recovering a database is also not set up out of the box by default.

The above highlights the differences between these two types of databases and how these differ in their backup and recovery strategies

The remainder of this blog will focus on making and restoring backups for local databases.

How to make a backup of local databases

There are currently two different methods to create a backup of a Singlestore local database. This can be achieved either by using the Singlestore Operator or using the backup SQL command.

Before having a closer look at each of these methods, please be aware that the sdb-admin tool mentioned under the backup and restore section in the Singlestore documentation, to create a backup doesn’t work against a Kubernetes deployment.

That’s why this method is not discussed in this blog. Having said that let’s have a closer look at each of the previously mentioned methods.

Using the Singlestore Operator

The Singlestore Operator is responsible for managing the Singlestore cluster. The operator can be used to apply changes like resizing the Singlestore cluster, replacing failing nodes or upgrading to a newer version of Singlestore. Another thing the operator can be used for is creating backups of your local databases.

To configure the operator to take backups you need to add the backupSpec to the MemSQL Kubernetes object. This object is used by the operator to spawn your Singlestore cluster and apply specific configurations to your cluster, like taking backups.

When a bacupSpec is added to the MemSQL Kubernetes object the Singlestore Operator will create a Kubernetes Cronjob object. This cronjob is scheduled at a configurable interval to take backups of your local databases. The backups of your local databases are saved on object storage. The backup method through the operator currently supports object storage on AWS, Azure and GCP. NFS or a local file system as a target for your backups are not supported.

You can use the Operator to take full and incremental backups of your database. Currently the operator only supports making backups and cannot be used to restore a database. To restore a database, you will need to use an SQL command. More information about what specifically can be configured in the backupSpec can be found here.

An example configuration taken from the documentation provided by Singlestore is shown below

backupSpec:
    CloudServiceProvider: AWS
    backupImage: gcr.io/singlestore-public/cellscripts:20230424161715-64bdc8ff
    bucketName: 29fce172-3e50-41d9-9440-bf378475cc1b
    s3Endpoint: http://10.43.241.252:8005
    s3Region: us-west-1
    schedule: 13 20 * * *
    secretName: backup-credentials

Seems easy right! A more detailed example of how to create backups of your local databases using Azure blob storage will be provided further on in this blog.

Using SQL commands

Another way to create backups of your local databases in your Singlestore Cluster is to use SQL commands. In this case the backup database command can be used to create a backup of your database. This command can be executed through the Singlestore CLI which can be found in the master aggregator pod. The command to backup a database is shown here:

BACKUP DATABASE <DATABASE-NAME> to <LOCATION> ;

More information about the command and some examples of using the backup command be found here.

When using the backup database command, you have the option of storing the backup either on object storage like S3, Azure Blob Storage, GCP buckets or a file system like the local filesystem or NFS .Similarly, you can use the restore database command to restore a local database on your Singlestore cluster either from object storage, NFS or local file system

Be aware that writing a backup or restoring a backup either from / to NFS or local filesystem requires additional configuration on a Kubernetes deployment of Singlestore and that this will only work with SQL commands.

Going through the additional configuration to backup / restore to / from NFS or local filesystem is outside the scope of this blog, but if you are interested in hearing about how this works let me know in the comments and I’ll see if I can write an additional blog about it!

Comparing both methods

Compare	Operator	SQL commands
Selection of database to backup	No way to select a database	You can specify the database
Supported storage repositories	AWS, GCP, Azure	AWS, GCP, Azure, NFS, Local
Supports both backup and restore	Only backup is supported by the Operator	Both backup and restore are supported
Scheduling	Scheduling of backup is done by the Operator	Requires manual configuration to set up regular execution
Full and incremental backups	Both are supported	Both are supported
Backup and Restore	Support backup only	Supports both

The above table compares both methods. Which method is best suited for you depends on your skills and requirements. I believe both methods have their advantages and disadvantages. Using the above comparison as input, you can hopefully decide which is best suited for your situation.

A thing I would like to call out based on the above table is the fact that the Singlestore Operator doesn’t support restoring a database. That is not a big problem, but instead of using the Operator to do both a backup and a restore, you will have to use the restore database SQL command to restore a database. So why not skip the Operator entirely and use the SQL commands to do backups and restores for you?

Well in the end that decision is up to you. I can see advantages in using the Operator for backups and using the restore database command. One of them being you don’t have to schedule the backup as this is taken care of by the operator compared to manually having to schedule backups when using the backup command. But if you want to make backups to an NFS server* for example the decision to use SQL commands is taken for you as the Operator doesn’t support this scenario.

*) Please see my earlier remark about NFS and Local file system on a Kubernetes deployment of Singlestore

Storing a backup on Azure Blob Storage using the Singlestore Operator

Earlier on in the blog the backup functionality of the Singlestore Operator was discussed. An example was shared on how the backup can be configured. In this part of the blog, we are going to look more closely at how you can configure the backup process through the Singlestore Operator to store backups to an Azure blob storage.

Taking a step back, there are two main parts that need to be configured:

Configuration in Azure
- Creating and configuring the storage account.
- Create a private endpoint through which the storage account can be accessed.
Configuration in Singlestore
- Create a Kubernetes secret that contains credentials to access storage account.
- Create a Kubernetes service account and assign the proper permissions to it.
- Create a patch to add the backupSpec configuration to the MemSQLcluster Kubernetes object

Configuration in Azure

On the Azure side there are two things that need to be done. First, we need to create a storage account and create a container within that storage account. And secondly a private endpoint needs to be created. This will allow us to access the blob storage through the private backend of Azure instead of going over the public internet.

Creating and configuring an Azure storage account

The storage account can be created by executing the command below.

az storage account create --name <name> --resource-group <rg> --location <location> --sku Standard_LRS --min-tls-version TLS1_2 --allow-blob-public-access true
az storage container create --name backup --account-name <storage account name>

Please note that an ADLS storage account is not supported for backups by the Singlestore Operator. The above commands create a regular storage account. Setting public access to true here is not required, but it is set to true here so that the contents of the storage account can be accessed through the Azure portal.

Creating a private endpoint for your storage account

A private endpoint in this case is used to make sure that the traffic flowing between the Singlestore Operator and the Azure blob storage does not leave the Azure backend. To create a private endpoint the following commands should be executed.

# retrieve storage account ID
id=$(az storage account show --name <name> --resource-group <rg> --query 'id' --output tsv)

# create private endpoint
az network private-endpoint create \
    --connection-name <name>-backup-private \
    --name <name>-backup-private \
    --private-connection-resource-id $id \
    --resource-group <rg> \
    --subnet <subnet AKS cluster> \
    --vnet-name <vnet AKS cluster> \
    --group-id blob

# create private dns zone
az network private-dns zone create --resource-group <rg> --name "privatelink.blob.core.windows.net"

# link dns zone to vnet
az network private-dns link vnet create --resource-group <rg> --zone-name "privatelink.blob.core.windows.net" --name <name of link> --virtual-network <vnet name> --registration-enabled false

# get id of network internface assigned to private endpoint
NETWORK_INTERFACE_ID=$(az network private-endpoint show \
        --name <private endpoint name> \
        --resource-group <rg>\
        --query 'networkInterfaces[0].id' \
        --output tsv)

# get ip from network interface
DNS_IP=$(az network nic show \
        --ids $NETWORK_INTERFACE_ID \
        --query 'ipConfigurations[0].privateIPAddress' \
        --output tsv)

# create dns zone record
az network private-dns record-set a add-record -g <rg> -z privatelink.blob.core.windows.net -n <storage account name> -a $DNS_IP

Configuration in Singlestore

Creating a Kubernetes secret

Once the private endpoints have been set up the next step is to configure everything for the backup to the Azure blob storage on Singlestore. First up is that we need to provide credentials to the backup process for it to be able to write to the Azure Blob Storage. This is done through a Kubernetes secret. This secret contains the account key that is used by the backup process to authenticate against the blob storage.

The below command shows how to create the secret.

#get the account key
key=$(az storage account keys list --account-name <storage account name> --resource-group <rg> | jq -r '.[0].value')
kubectl create secret generic blob-credentials -n <namespace> --from-literal=AZURE_ACCOUNT_NAME=<storage account name> --from-literal=AZURE_ACCOUNT_KEY=$key

Creating a Kubernetes service account

After creating the secret. The next step is to create the Kubernetes service account that is used by the operator to perform the backup. This service account doesn’t exist by default.

Use the below commands to create the necessary service account and assign the proper permission to that account on the Kubernetes cluster.

kubectl apply -f - <<EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: sas-singlestore-backup-cluster
  namespace: <namespace>
  labels:
   sas.com/admin: cluster-wide
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: sas-singlestore-backup-cluster
  namespace: <namespace>
  labels:
    sas.com/admin: cluster-wide
rules:
- apiGroups: ["batch"]
  resources: ["jobs"]
  verbs: ["get", "list"]
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
  - list
- apiGroups:
  - memsql.com
  resources:
  - memsqlclusters
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: sas-singlestore-backup-cluster
  namespace: <namespace>
  labels:
    sas.com/admin: cluster-wide
subjects:
- kind: ServiceAccount
  name: sas-singlestore-backup-cluster
roleRef:
  kind: Role
  name: sas-singlestore-backup-cluster
  apiGroup: rbac.authorization.k8s.io
EOF

Patching the MemSQLcluster object

Once the Kubernetes service account is in place we can start with configuring the Operator to schedule the backup process. This is done through the backupSpec. Below you will find a patch that will be used to add the backupSpec to the MemSQL cluster object on the Kubernetes cluster.

cat << EOF > /tmp/patch-s2-operator-backup.yaml
spec:
 backupSpec:
    CloudServiceProvider: Azure
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - preference:
            matchExpressions:
            - key: workload.sas.com/class
              operator: In
              values:
              - stateful
              - stateless
          weight: 100
        - preference:
            matchExpressions:
            - key: workload.sas.com/class
              operator: NotIn
              values:
              - compute
              - cas
              - connect
          weight: 50
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.azure.com/mode
              operator: NotIn
              values:
              - system
    backupImage: cr.sas.com/viya-4-x64_oci_linux_2-docker/sas-singlestore-operator:<singlestore operator image tag>
    bucketName: backup #container name in azure blob storage
    imagePullSecrets:
    - name: <sas image pull secret>
    objectMetaOverrides: {}
    s3Endpoint: https://<storage account name>.privatelink.blob.core.windows.net
    schedule: 30 15 * * *
    secretName: blob-credentials
    securityContext:
      allowPrivilegeEscalation: false
      capabilities:
        drop:
        - ALL
      readOnlyRootFilesystem: true
      runAsNonRoot: true
      seccompProfile:
        type: RuntimeDefault
    serviceAccountName: sas--singlestore-backup-cluster
    testBackupSpec: {}
    tolerations:
    - effect: NoSchedule
      key: workload.sas.com/class
      operator: Equal
      value: stateful
    - effect: NoSchedule
      key: workload.sas.com/class
      operator: Equal
      value: stateless
EOF

The above configures the backupSpec. A couple of important things to call out here

Affinity and tolerations are added to the backupSpec. These make sure that the backup process that is simply a pod will consider the SAS Viya workload placement when the pod is deployed to the Kubernetes cluster.
backupImage is added and points to the operator image. The operator image is used to perform the backup of your local databases on the Singlestore cluster. Make sure to put the right image tag in place for the operator version you are using in your SpeedyStore deployment.
Image pull secret: This is used by the backup process to pull the operator image from a registry that is protected with a username and password. Adjust this to match your secret that you use in your SAS Viya deployment.
s3Endpoint is added to tell the backup process to use the private endpoint link to connect to the Azure Blob Storage instead of going over the internet.
serviceAccountName is added to reference the Kubernetes service account that was just created for the backup process.

To apply the backupSpec to your running Singlestore cluster run the below command

kubectl patch memsql sas-singlestore-cluster --patch-file /tmp/patch-s2-operator-backup.yaml --type merge -n <namespace>

This will cause the operator to generate a cronjob which is scheduled to run a backup at the specified interval.

Seeing it in action

After applying the patch to the MemSQL cluster object you can check the logs of the operator to validate if the cronjob was created or not. Look for the messages that are shown here on the screenshot to verify if the backup has been configured.

Another way to check this is to list out the existing cronjobs in your namespace where SAS SpeedyStore is deployed. Look for a cronjob called backup-sas-singlestore-cluster. This is the name of the cronjob that is used to backup your local Singlestore databases.

Once you have confirmed that the cronjob is there, you can wait until it is started automatically, or you can manually trigger the backup process by executing the following command.

kubectl create job backup-sas-singlestore-cluster-`date +%s` --from cronjobs/backup-sas-singlestore-cluster -n <namespace>

The above command will create a job object on the Kubernetes cluster and this job will create a pod. This pod runs the backup process. A successful run of the backup process will look similar as to what is shown here on the screenshot

As you can see the backup process has created a backup of two databases and has written the results of the backup to the Azure Blob Storage as shown here in the screenshot.

Conclusion

By reading the blog I hope you now have a better understanding of the options that are available to you to create a backup of your databases on SInglestore. Both the operator and the SQL commands to create a backup have their advantages and their disadvantages. Which one you prefer is down to your preferences and your requirements and hopefully the comparison between the two in this blog can help you decide.

What’s interesting about the SQL commands is that it allows you to backup to a local filesystem. And as you are aware a local file system on Kubernetes doesn’t necessarily have to be a local file system perse. Through the usage of CSI drivers for instance, you can expose filesystems like NFS or Blob storage as a local filesystem to Singlestore pods running on Kubernetes.

This opens up new possibilities to explore for creating and restoring backups!