Taking backups and making sure that these backups are stored securely is an important aspect of taking good care of your applications. This doesn’t seem like rocket science but having a proper backup strategy in place is not as easy as it seems.
Here are some examples of things that may go wrong
As SAS administrators you are familiar with how to create backups and how to restore the backups of the SAS Viya platform. And of course, you have a proper backup strategy in place to make sure the SAS Viya platform will be running smoothly. But with the addition of Singlestore to the SAS Viya platform there is another component that requires your attention with regards to backups as a SAS Administrator.
Are you already making backups of Singlestore? In this blog I’d like to have a closer look at the options that are available for making a backup of Singlestore databases and discuss some of the pros and cons that come with these options. And I will end by providing an example of how to configure SAS SpeedyStore to take a backup on Azure Blob Storage.
Singlestore makes a distinction between two different types of databases:
It’s important to understand the distinction between these two types of databases. Because from a backup and restore perspective these are treated differently.
Unlimited storage
An unlimited storage database allows you to use point in time recovery to restore your unlimited storage database to a specific point in time. It doesn’t use any backups for that. It relies upon the unlimited storage feature in Singlestore and leverages the blobs stored in the object storage. More information on this topic can be found here.
Local databases
A local database can be backed up and restored using the backup and recovery procedures available within Singlestore. Creating a backup of a database is something that needs to be configured and is not set up out of the box. Similarly recovering a database is also not set up out of the box by default.
The above highlights the differences between these two types of databases and how these differ in their backup and recovery strategies
The remainder of this blog will focus on making and restoring backups for local databases.
There are currently two different methods to create a backup of a Singlestore local database. This can be achieved either by using the Singlestore Operator or using the backup SQL command.
Before having a closer look at each of these methods, please be aware that the sdb-admin tool mentioned under the backup and restore section in the Singlestore documentation, to create a backup doesn’t work against a Kubernetes deployment.
That’s why this method is not discussed in this blog. Having said that let’s have a closer look at each of the previously mentioned methods.
The Singlestore Operator is responsible for managing the Singlestore cluster. The operator can be used to apply changes like resizing the Singlestore cluster, replacing failing nodes or upgrading to a newer version of Singlestore. Another thing the operator can be used for is creating backups of your local databases.
To configure the operator to take backups you need to add the backupSpec to the MemSQL Kubernetes object. This object is used by the operator to spawn your Singlestore cluster and apply specific configurations to your cluster, like taking backups.
When a bacupSpec is added to the MemSQL Kubernetes object the Singlestore Operator will create a Kubernetes Cronjob object. This cronjob is scheduled at a configurable interval to take backups of your local databases. The backups of your local databases are saved on object storage. The backup method through the operator currently supports object storage on AWS, Azure and GCP. NFS or a local file system as a target for your backups are not supported.
You can use the Operator to take full and incremental backups of your database. Currently the operator only supports making backups and cannot be used to restore a database. To restore a database, you will need to use an SQL command. More information about what specifically can be configured in the backupSpec can be found here.
An example configuration taken from the documentation provided by Singlestore is shown below
backupSpec:
CloudServiceProvider: AWS
backupImage: gcr.io/singlestore-public/cellscripts:20230424161715-64bdc8ff
bucketName: 29fce172-3e50-41d9-9440-bf378475cc1b
s3Endpoint: http://10.43.241.252:8005
s3Region: us-west-1
schedule: 13 20 * * *
secretName: backup-credentials
Seems easy right! A more detailed example of how to create backups of your local databases using Azure blob storage will be provided further on in this blog.
Another way to create backups of your local databases in your Singlestore Cluster is to use SQL commands. In this case the backup database command can be used to create a backup of your database. This command can be executed through the Singlestore CLI which can be found in the master aggregator pod. The command to backup a database is shown here:
BACKUP DATABASE <DATABASE-NAME> to <LOCATION> ;
When using the backup database command, you have the option of storing the backup either on object storage like S3, Azure Blob Storage, GCP buckets or a file system like the local filesystem or NFS .Similarly, you can use the restore database command to restore a local database on your Singlestore cluster either from object storage, NFS or local file system
Be aware that writing a backup or restoring a backup either from / to NFS or local filesystem requires additional configuration on a Kubernetes deployment of Singlestore and that this will only work with SQL commands.
Going through the additional configuration to backup / restore to / from NFS or local filesystem is outside the scope of this blog, but if you are interested in hearing about how this works let me know in the comments and I’ll see if I can write an additional blog about it!
Compare |
Operator |
SQL commands |
Selection of database to backup |
No way to select a database |
You can specify the database |
Supported storage repositories |
AWS, GCP, Azure |
AWS, GCP, Azure, NFS*, Local* |
Supports both backup and restore |
Only backup is supported by the Operator |
Both backup and restore are supported |
Scheduling |
Scheduling of backup is done by the Operator |
Requires manual configuration to set up regular execution |
Full and incremental backups |
Both are supported |
Both are supported |
Backup and Restore |
Support backup only |
Supports both |
The above table compares both methods. Which method is best suited for you depends on your skills and requirements. I believe both methods have their advantages and disadvantages. Using the above comparison as input, you can hopefully decide which is best suited for your situation.
A thing I would like to call out based on the above table is the fact that the Singlestore Operator doesn’t support restoring a database. That is not a big problem, but instead of using the Operator to do both a backup and a restore, you will have to use the restore database SQL command to restore a database. So why not skip the Operator entirely and use the SQL commands to do backups and restores for you?
Well in the end that decision is up to you. I can see advantages in using the Operator for backups and using the restore database command. One of them being you don’t have to schedule the backup as this is taken care of by the operator compared to manually having to schedule backups when using the backup command. But if you want to make backups to an NFS server* for example the decision to use SQL commands is taken for you as the Operator doesn’t support this scenario.
*) Please see my earlier remark about NFS and Local file system on a Kubernetes deployment of Singlestore
Earlier on in the blog the backup functionality of the Singlestore Operator was discussed. An example was shared on how the backup can be configured. In this part of the blog, we are going to look more closely at how you can configure the backup process through the Singlestore Operator to store backups to an Azure blob storage.
Taking a step back, there are two main parts that need to be configured:
On the Azure side there are two things that need to be done. First, we need to create a storage account and create a container within that storage account. And secondly a private endpoint needs to be created. This will allow us to access the blob storage through the private backend of Azure instead of going over the public internet.
The storage account can be created by executing the command below.
az storage account create --name <name> --resource-group <rg> --location <location> --sku Standard_LRS --min-tls-version TLS1_2 --allow-blob-public-access true
az storage container create --name backup --account-name <storage account name>
Please note that an ADLS storage account is not supported for backups by the Singlestore Operator. The above commands create a regular storage account. Setting public access to true here is not required, but it is set to true here so that the contents of the storage account can be accessed through the Azure portal.
A private endpoint in this case is used to make sure that the traffic flowing between the Singlestore Operator and the Azure blob storage does not leave the Azure backend. To create a private endpoint the following commands should be executed.
# retrieve storage account ID
id=$(az storage account show --name <name> --resource-group <rg> --query 'id' --output tsv)
# create private endpoint
az network private-endpoint create \
--connection-name <name>-backup-private \
--name <name>-backup-private \
--private-connection-resource-id $id \
--resource-group <rg> \
--subnet <subnet AKS cluster> \
--vnet-name <vnet AKS cluster> \
--group-id blob
# create private dns zone
az network private-dns zone create --resource-group <rg> --name "privatelink.blob.core.windows.net"
# link dns zone to vnet
az network private-dns link vnet create --resource-group <rg> --zone-name "privatelink.blob.core.windows.net" --name <name of link> --virtual-network <vnet name> --registration-enabled false
# get id of network internface assigned to private endpoint
NETWORK_INTERFACE_ID=$(az network private-endpoint show \
--name <private endpoint name> \
--resource-group <rg>\
--query 'networkInterfaces[0].id' \
--output tsv)
# get ip from network interface
DNS_IP=$(az network nic show \
--ids $NETWORK_INTERFACE_ID \
--query 'ipConfigurations[0].privateIPAddress' \
--output tsv)
# create dns zone record
az network private-dns record-set a add-record -g <rg> -z privatelink.blob.core.windows.net -n <storage account name> -a $DNS_IP
Once the private endpoints have been set up the next step is to configure everything for the backup to the Azure blob storage on Singlestore. First up is that we need to provide credentials to the backup process for it to be able to write to the Azure Blob Storage. This is done through a Kubernetes secret. This secret contains the account key that is used by the backup process to authenticate against the blob storage.
The below command shows how to create the secret.
#get the account key
key=$(az storage account keys list --account-name <storage account name> --resource-group <rg> | jq -r '.[0].value')
kubectl create secret generic blob-credentials -n <namespace> --from-literal=AZURE_ACCOUNT_NAME=<storage account name> --from-literal=AZURE_ACCOUNT_KEY=$key
After creating the secret. The next step is to create the Kubernetes service account that is used by the operator to perform the backup. This service account doesn’t exist by default.
Use the below commands to create the necessary service account and assign the proper permission to that account on the Kubernetes cluster.
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sas-singlestore-backup-cluster
namespace: <namespace>
labels:
sas.com/admin: cluster-wide
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: sas-singlestore-backup-cluster
namespace: <namespace>
labels:
sas.com/admin: cluster-wide
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list"]
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- list
- apiGroups:
- memsql.com
resources:
- memsqlclusters
verbs:
- get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: sas-singlestore-backup-cluster
namespace: <namespace>
labels:
sas.com/admin: cluster-wide
subjects:
- kind: ServiceAccount
name: sas-singlestore-backup-cluster
roleRef:
kind: Role
name: sas-singlestore-backup-cluster
apiGroup: rbac.authorization.k8s.io
EOF
Once the Kubernetes service account is in place we can start with configuring the Operator to schedule the backup process. This is done through the backupSpec. Below you will find a patch that will be used to add the backupSpec to the MemSQL cluster object on the Kubernetes cluster.
cat << EOF > /tmp/patch-s2-operator-backup.yaml
spec:
backupSpec:
CloudServiceProvider: Azure
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: workload.sas.com/class
operator: In
values:
- stateful
- stateless
weight: 100
- preference:
matchExpressions:
- key: workload.sas.com/class
operator: NotIn
values:
- compute
- cas
- connect
weight: 50
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.azure.com/mode
operator: NotIn
values:
- system
backupImage: cr.sas.com/viya-4-x64_oci_linux_2-docker/sas-singlestore-operator:<singlestore operator image tag>
bucketName: backup #container name in azure blob storage
imagePullSecrets:
- name: <sas image pull secret>
objectMetaOverrides: {}
s3Endpoint: https://<storage account name>.privatelink.blob.core.windows.net
schedule: 30 15 * * *
secretName: blob-credentials
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
seccompProfile:
type: RuntimeDefault
serviceAccountName: sas--singlestore-backup-cluster
testBackupSpec: {}
tolerations:
- effect: NoSchedule
key: workload.sas.com/class
operator: Equal
value: stateful
- effect: NoSchedule
key: workload.sas.com/class
operator: Equal
value: stateless
EOF
The above configures the backupSpec. A couple of important things to call out here
To apply the backupSpec to your running Singlestore cluster run the below command
kubectl patch memsql sas-singlestore-cluster --patch-file /tmp/patch-s2-operator-backup.yaml --type merge -n <namespace>
This will cause the operator to generate a cronjob which is scheduled to run a backup at the specified interval.
After applying the patch to the MemSQL cluster object you can check the logs of the operator to validate if the cronjob was created or not. Look for the messages that are shown here on the screenshot to verify if the backup has been configured.
Another way to check this is to list out the existing cronjobs in your namespace where SAS SpeedyStore is deployed. Look for a cronjob called backup-sas-singlestore-cluster. This is the name of the cronjob that is used to backup your local Singlestore databases.
Once you have confirmed that the cronjob is there, you can wait until it is started automatically, or you can manually trigger the backup process by executing the following command.
kubectl create job backup-sas-singlestore-cluster-`date +%s` --from cronjobs/backup-sas-singlestore-cluster -n <namespace>
The above command will create a job object on the Kubernetes cluster and this job will create a pod. This pod runs the backup process. A successful run of the backup process will look similar as to what is shown here on the screenshot
As you can see the backup process has created a backup of two databases and has written the results of the backup to the Azure Blob Storage as shown here in the screenshot.
By reading the blog I hope you now have a better understanding of the options that are available to you to create a backup of your databases on SInglestore. Both the operator and the SQL commands to create a backup have their advantages and their disadvantages. Which one you prefer is down to your preferences and your requirements and hopefully the comparison between the two in this blog can help you decide.
What’s interesting about the SQL commands is that it allows you to backup to a local filesystem. And as you are aware a local file system on Kubernetes doesn’t necessarily have to be a local file system perse. Through the usage of CSI drivers for instance, you can expose filesystems like NFS or Blob storage as a local filesystem to Singlestore pods running on Kubernetes.
This opens up new possibilities to explore for creating and restoring backups!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.