Mapping SPD Server Features to SingleStore

3 Likes

SAS' SingleStore partnership has led some of our customers to consider replacing their end-of-life SPD Server (SPDS) implementations with SingleStore. After all, both are databases, more or less, and both are integrated with SAS in one way or another.

Here, we'll list the key functionality of SPDS and map it to the corresponding functionality in SingleStore (S2) to aid in migration.

1. SPDS Cluster Tables ≈ S2 Sharding

SPDS Dynamic Cluster tables are, in SQL terms, essentially views of unioned tables. For example you might have 24 months worth of order data stored in 24 separate SPDS tables. A cluster table then could be created that combines all 24 into a single logical table.

Cluster tables offer enhanced query processing capabilities like member elimination and BY-group processing using information about the data in each member table as well as enhanced management capabilities that make rolling members in and out of the cluster easy.

Query Optimization

SPDS cluster tables are very similar to Sharding in SingleStore. Sharding is essentially a form of range partitioning. Like cluster tables, sharding offers partition elimination as well as Group BY processing enhancement and other capabilities.

So, from a query optimization perspective, Singlestore should offer the same or better functionality. Therefore if a customer is currently using SPDS cluster tables for query optimization, they should choose a SingleStore Shard Key that achieves the same data distribution as the SPDS cluster table scheme.

Partition/Member Management

Functionality was built in to SPDS to make rolling member tables in and out of cluster tables easy. That being a quick way to update the cluster table with the latest data while removing older, less relevant data.

SingleStore does not allow for manual manipulation of table partitions but claims that it doesn't need to since it can delete and load data so quickly.

Given the impressive numbers from the linked resources, it's clear that SingleStore could handle most customers' data volumes with simple SQL Delete and Insert operations instead of partition manipulation. However, if a customer is using cluster tables for member management, then its table maintenance processes would need to change when moving to SingleStore.

2. SQL Optimization

SPDS has several advertised query optimization capabilities. However its star join capability is the true standout. It is truly world class for optimizing multi-table, star schema joins.

SingleStore also boasts world class query performance. This along with SingleStore's multi-machine processing capabilities ensure that S2 should perform as well or better than SPDS in almost all situations, assuming adequate hardware. However I would still check the S2 performance on any queries that triggered star joins in SPDS.

3. Indexes

As a row-oriented data server, indexing is an important component of SPDS for optimizing queries. As a columnar data server, indexing in SingleStore fulfills a different role. As you can see in the linked material, the entire table structure in S2 is called the "Clustered columnstore Index." This is because, "in this layout, any one of the columns more closely matches the structure of an index in a row-orien..." Thus we can think of the entire column store as an index for each field.

So, in general, there is no need to recreate all of the SPDS indexes in SingleStore. In fact, you couldn't even do it if you wanted to. You can of course create a primary key for each table as well as a shard key which determines how the data gets partitioned (see above). You can also create hash indexes but these are really to support transaction processing as opposed to the typical low cardinality uses of indexes in SPDS.

4. Sorting

Similar to base SAS, SPDS relies heavily on pre-sorting for performance.

SingleStore also allows for pre-sorting by setting a sort key for each table.

Since sort keys can be any number of columns, make sure to create your SingleStore tables with a sort key that ensures the same sort order used for query optimization in SPDS.

5. SPDS Partition Files / Data Distribution

Before multi-machine architectures were common, SPDS was designed to process big data efficiently by spreading table blocks over multiple I/O controllers and/or disks. This allowed SPDS to access multiple blocks in parallel with separate threads while avoiding I/O log-jams. This is a different concept than that of sharding / partitioning / cluster tables discussed above. It is at a lower level and occurs within each cluster table member.

Being a massively parallel processing engine, SingleStore utilizes multiple machines for both processing and storage. So, by its nature, it will also spread data across multiple disk arrays (assuming each machine is using local disk). Additionally disk striping could be used to ensure a similar affect to that of SPDS partition files.

As far as the topic of I/O optimization, SingleStore does not recommend one approach or another but simply requires certain I/O performance from its block storage. There are also I/O related parameters in SingleStore that can be configured to maximize performance.

6. SingleStore Table Structure

All of the above information assumes that you will be converting to a SingleStore Columnstore table type. This is the optimal table structure for analytic and decision support style processing and is now the default table type in SingleStore's new Universal Storage offering which enhanced its columnstore format to handle transaction processing.

Given that columnstore is optimal for the same types of processing that SPDS is also designed for and now can handle transactional processing as well, it should be the table format chosen to replace SPDS.

LinusH · ‎09-01-2023

Good read.

When it comes to transactional processing, I guess there should be no doubt that SingelStore could replace SPDS record level locking proxy.

But it would be interesting to learn about migrating SPDS dynamic locking scenarios.

AhmedAl_Attar · ‎09-01-2023

Nice informative post, but I wonder if the clients would be interested/willing to pay extra money to replace an already licensed SAS product/module when there are Open-Source alternatives they could experiment with

- Apache Doris: Open-Source Real-Time Data Warehouse - Apache Doris

(Apache Druid, TiDB, ClickHouse, or Apache Doris? A Comparison of OLAP Tools - DEV Community )

- Apache Pinote

Just a thought

LinusH · ‎09-04-2023

@AhmedAl_Attar S2 is to replace SPDS, which in turn carries a license fee, so it's not only adding cost by migrating. Whether it's going to be more expensive depends on each use case, since the fee calculation is vastly different.

And there are of course may other vendor and products out there, but if you re interested in continue investing in SAS data management and analytics, SAS offers a tighter integration with S2 than most other alternatives.

JBailey · ‎09-04-2023

Hi @AhmedAl_Attar, It has been a while. I hope you are doing well.

I am curious, how did you hear about TiDB?

AhmedAl_Attar · ‎09-04-2023

Hi Guys,

@LinusH, I'm all familiar with S2 and have been following them and SAS's integration for a while now. But in the recent years, I've seen more and more customers of SAS shifting to Open-Source Software (OSS) and trying to reduce their Commercial off-the-shelf (COTS) software license fees. Really now a days, successful software vendors are meeting the customers where they are, rather than trying to influence them to come over to the commercial side!

This is just my observation.

@JBailey, Your LinkedIn posts and technical YouTube videos lead me to TiDB 😉

StephenFoerster · ‎09-05-2023

@AhmedAl_Attar , @JBailey, It's great to hear from you guys. Thanks for the information and perspective. Hope you are well

JBailey · ‎11-29-2023

Hi @AhmedAl_Attar, That is good to know. Thank you for reading my posts and watching the videos. Is there any content you would like to see on our YouTube Channel? Hopefully, you have subscribed. We are going to add a lot of content in 2024 (Included more certification helpers - uh, cheat codes) 🤣

JBailey · ‎11-29-2023

Hi @StephenFoerster, it is great to cross paths with you again. I am doing great and hope you are. I miss working with you.

AhmedAl_Attar · ‎11-29-2023

@JBailey Glad to hear and best of luck with all you are doing. I don't have any specific requirements/requests! I'm mainly trying to keep my finger on the pulse where possible. 😊