Short Introduction
Let’s start with the first question… What happens if you add SingleStore to SAS Viya?
SAS Viya + SingleStore = SAS SpeedyStore
SAS SpeedyStore, formerly known as SAS with SingleStore, offers a unified solution that combines Viya's advanced analytics and AI capabilities with SingleStore’s high-performance, cloud-native database. This integration enables users to analyze real-time data directly through SingleStore, delivering faster insights and better decision-making while minimizing the need for data movement.
To better understand SpeedyStore, let’s take a closer look at SingleStore itself... What is SingleStore? The textbook definition goes like this:
SingleStore is a distributed, relational, SQL database management system that features ANSI SQL support and it is known for speed in data ingest, transaction processing, and query processing.
When it comes to its architecture, the SingleStore cluster is made out of nodes which fulfill different functions. We have aggregator and leaf nodes. Aggregators are responsible to get the SQL statement (query) and then pass it on to the leaves since that's where the data is stored. After this, the data is sent back to the aggregators. Besides these, we also have the SingleStore operator and the SingleStore configuration pod.
SpeedyStore Workload Placement
Workload placement is quite a nuanced and often complex topic of deploying SAS Viya, but in a nutshell, a common architecture on one of the main cloud providers would involve creating a node pool for each workload class:
The difference between SAS Viya and SAS SpeedyStore is that a new workload class is needed and with that a new nodepool. And you guessed right - this nodepool would be reserved for the SingleStore cluster.
Note: Not to confuse the Azure Kubernetes (AKS) cluster with the SingleStore cluster. The AKS cluster is where SAS SpeedyStore is deployed. The SingleStore cluster is part of SAS SpeedyStore and therefore it is part of the same namespace as the SAS platform.
SpeedyStore Architecture
We can take a look at the SAS SpeedyStore architecture below:
Figure 1: SpeedyStore Architecture with one nodepool dedicated to SingleStore
Place the SingleStore cluster workload, but differently
If we recall what we mentioned about the function of the leaf nodes, then we can rethink the SingleStore cluster architecture such that we make better use of the resources. We can start by having two Kubernetes node pools dedicated to the SingleStore cluster - one reserved for the aggregator nodes and one reserved for the leaf nodes:
Figure 2: SpeedyStore Architecture with two nodepools dedicated to leaf and aggregator nodes
We still need to make some changes to our deployment. Since the initial configuration was set to schedule all the SingleStore pods - including the ones corresponding to the aggregator nodes and the leaf nodes - onto the same node pool. However, since we decided on separating the SingleStore cluster, we would need to label and taint our nodes accordingly:
kubectl label node <AGGREGATOR_NODE> workload.sas.com/class=s2aggregator
kubectl taint node <AGGREGATOR_NODE> workload.sas.com/class=s2aggregator:NoSchedule
kubectl label nodes <AGGREGATOR_NODE> node.used.for=singlestore --overwrite
kubectl taint nodes <LEAF_NODE> workload.sas.com/class=singlestore:NoSchedule --overwrite
kubectl label nodes <LEAF_NODE> workload.sas.com/class=singlestore --overwrite
kubectl label nodes <LEAF_NODE> node.used.for=singlestore --overwrite
Now we need to create a patch in order to direct these pods onto their responsible nodes:
apiVersion: builtin
kind: PatchTransformer
metadata:
name: s2clustertopology
patch: |-
- op: replace
path: /spec/aggregatorSpec/tolerations
value:
- effect: NoSchedule
key: workload.sas.com/class
operator: Equal
value: s2aggregator
- op: replace
path: /spec/schedulingDetails/aggregator/affinity/nodeAffinity/preferredDuringSchedulingIgnoredDuringExecution/0/preference/matchExpressions
value:
- key: workload.sas.com/class
operator: In
values:
- s2aggregator
- op: replace
path: /spec/schedulingDetails/aggregator/tolerations
value:
- effect: NoSchedule
key: workload.sas.com/class
operator: Equal
value: s2aggregator
- op: replace
path: /spec/schedulingDetails/master/affinity/nodeAffinity/preferredDuringSchedulingIgnoredDuringExecution/0/preference/matchExpressions
value:
- key: workload.sas.com/class
operator: In
values:
- s2aggregator
- op: replace
path: /spec/schedulingDetails/master/tolerations
value:
- effect: NoSchedule
key: workload.sas.com/class
operator: Equal
value: s2aggregator
target:
kind: MemsqlCluster
name: sas-singlestore-cluster
---
apiVersion: builtin
kind: PatchTransformer
metadata:
name: s2daemonsettolerations
patch: |-
- op: replace
path: /spec/template/spec/tolerations
value:
- effect: NoSchedule
key: workload.sas.com/class
operator: Equal
value: singlestore
- effect: NoSchedule
key: workload.sas.com/class
operator: Equal
value: s2aggregator
- op: replace
path: /spec/template/spec/nodeSelector
value:
node.used.for: singlestore
target:
kind: DaemonSet
name: sas-singlestore-osconfig
Note: An important thing to keep in mind here is that the SingleStore operator needs to exist on both nodes.
Benefits of distributing the workload
So now you might ask yourself, but what are the benefits?
One of the main benefits is of course, a better workload placement since the leaf nodes are the ones doing most of the heavy lifting data wise. On the other hand, aggregator nodes are responsible for handling queries and client interactions which require fewer resources.
Another important point that needs to be mentioned is the option of scaling the aggregators and leaves separately, giving us the option to make resource allocation more efficient and reducing unnecessary overhead. Fault isolation needs to be considered as well: issues impacting one node can be addressed without impacting the others.
Depending on the environment, a potential cost reduction of the infrastructure can also be achieved. Since leaf nodes typically demand more powerful machines and aggregator nodes can operate efficiently on smaller instances, we can avoid over provisioning and reduce the cloud spend.
And last but not least, since SingleStore is licensed based on the number of DBUs (database units), the separation and optimization can help with license consumption.
Conclusion
SAS SpeedyStore brings the analytical power of SAS Viya and the speed of SingleStore to deliver real-time analytics and faster insights. Distributing the workload by configuring dedicated node pools for the leaves and aggregators can improve performance, enhance scalability and potentially reduce infrastructure and licensing costs, while still maintaining a cloud-native architecture.
Reference
SAS® Viya® Platform Administration - Deployment Architecture
SAS® Viya® Platform Administration - Plan the Workload Placement
SAS® Viya® Integration with SingleStore: Architecture and Deployment Guidance
Engineering and Analyzing Data at Scale with SAS® SpeedyStore
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.