Using SAS with SingleStore – Enhancing Performance with Aggregate Pushdown
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
SAS with SingleStore is a combined solution where SAS's advanced analytics and AI capabilities seamlessly integrate with SingleStore's high-performance, cloud-native database, allowing users to perform complex data analysis directly on real-time data stored within SingleStore, resulting in faster insights and improved decision-making without the need for extensive data movement.
A new capability was recently introduced (Stable 2024.11) to enhance CAS integration with SingleStore: aggregate pushdown for the simple.summary action.
What is it?
The simple.summary CAS action generates descriptive statistics for numeric variables, grouped by category variables. It is widely used across various SAS applications, including SAS Visual Analytics.
With aggregate pushdown for the simple.summary action, more computations—specifically aggregation—are offloaded to SingleStore, resulting in:
- Computations being performed closer to the data, leveraging SingleStore’s infrastructure.
- A significant reduction in the amount of data streamed from SingleStore to CAS.
How does simple.summary work WITHOUT aggregate pushdown?
Below is the data processing flow when a simple.summary CAS action runs on a SingleStore table, without aggregate pushdown enabled.
A user triggers a CAS action, from SAS code, a SAS application, or another CAS client.
The data management part of the CAS action is pushed to SingleStore. Row filtering, variable selection and new calculated columns are computed in SingleStore.
This results in a subset of detailed data on SingleStore. Depending on the case, it can still be millions of observations.
This subset of data is streamed from SingleStore to CAS and the CAS action makes additional computations as needed.
When done, results are consolidated and sent back to the calling client.
Finally, the temporary subset in CAS is cleaned up.
How does simple.summary work WITH aggregate pushdown?
Here is the data processing flow when a simple.summary CAS action runs on a SingleStore table, with aggregate pushdown enabled.
A user triggers a CAS action, from SAS code, a SAS application, or another CAS client.
The data management part of the CAS action is pushed to SingleStore. Row filtering, variable selection, new calculated columns and now aggregation are computed in SingleStore.
This results in a subset of aggregated data on SingleStore containing a few records.
This subset of data is streamed from SingleStore to CAS and the CAS action makes final adjustments if needed.
When done, results are consolidated and sent back to the calling client.
Finally, the temporary subset in CAS is cleaned up.
How to trigger aggregate pushdown?
You can add the sql=true option to your simple.analytics CAS action code to activate aggregate pushdown. Here is an example:
/* Pushdown aggregation to SingleStore */
proc cas ;
simple.summary /
inputs={"actual","predict","diff"},
subSet={"SUM"},
sql=true,
table={
caslib="s2",
name="bigprdsale_from_s2",
groupBy={"country","product","division_alias"},
where="country=""CANADA"""
computedVars={{name="division_alias"},{name="diff"}},
computedVarsProgram="division_alias=substr(division,1,3) ; diff=actual-predict ;"
} ;
quit ;
When you run this code, you should observe the following message in your SAS log:
NOTE: The aggregation is being calculated in the database.
Of course, if you want to have a more global behavior or if you can’t customize the CAS code (for example, you’re using SAS Visual Analytics which is a big consumer of simple.analytics), you can enable aggregate pushdown at the CASLIB definition with the aggregatePushdown=true option, as shown below:
/* SingleStore caslib */
caslib s2 desc="My S2 Caslib"
dataSource=(srctype='singlestore',
host="svc-sas-singlestore-cluster-dml.gelenv.svc.cluster.local",
port=3306,
database="geldm",
epdatabase="gelwork",
user="myuser",
pass="mypass",
aggregatePushdown=true
) libref=cass2 global ;
All simple.summary aggregations will be pushed down for all SingleStore tables in this CASLIB, unless sql=false is set at the CAS action level.
Additional statistics
In order to handle kurtosis or skewness statistics, additional functions are needed on the SingleStore side. Indeed, you must download the open-aggregates collection of user-defined functions (UDFs) and store them in a SingleStore database.
The UDFs, and the instructions for storing them, are available in GitHub at https://github.com/sassoftware/open-aggregates.
If you request aggregate pushdown with kurtosis or skewness and don’t have the UDFs deployed, you will get this message in the log:
NOTE: The aggregation could not be calculated in the database.
A SingleStore user-defined function required for the aggregation does not exist in the database.
See SAS documentation about aggregate pushdown with SingleStore.
After deploying the functions in SingleStore, update your CASLIB definition to include the udfDatabase option, specifying the database where the UDFs are located:
caslib s2 desc="My SingleStore Caslib"
dataSource=(srctype='singlestore',
host="svc-sas-singlestore-cluster-dml.gelenv.svc.cluster.local",
port=3306
database="geldm",
epDatabase="gelwork",
udfDatabase="geludf",
user="myuser",
pass="mypass",
aggregatePushdown=true
) libref=cass2 global ;
Then running a simple.summary with kurtosis and skewness should send the aggregation to SingleStore properly:
143 proc cas ;
144 simple.summary /
145 inputs={"actual","predict"},
146 subSet={"SUM","KURTOSIS","SKEWNESS"},
147 table={
148 caslib="s2",
149 name="bigprdsale_from_s2",
150 groupBy={"country","product"},
151 where="country=""CANADA"""
152 } ;
153 quit ;
NOTE: Active Session now MYSESSION.
NOTE: The aggregation is being calculated in the database.
NOTE: The SAS Embedded Process is being used for this operation.
NOTE: The PROCEDURE CAS printed pages 3-7.
NOTE: PROCEDURE CAS used (Total process time):
real time 2.93 seconds
cpu time 0.05 seconds
Performance examples
Let's explore some performance examples. First, we'll outline the environment used for these tests.
CAS Infrastructure | 1 Controller Node 3 Worker Nodes |
SingleStore Infrastructure | 1 Master Aggregator Node 1 Child Aggregator Node 2 Leaf Nodes |
SingleStore Table | 75.8 M Rows 19 Columns |
The table below presents key performance metrics, highlighting the importance of aggregate pushdown for accelerating insights from SingleStore data while eliminating the need for data replication and movement. The green cells show the best times per test.
Test / Run Time in seconds |
Table loaded in CAS |
SingleStore Table without Aggregate Pushdown |
SingleStore Table with Aggregate Pushdown |
Simple.summary with all statistics* (16) |
31.49 | 87.18 | 37.33 |
Simple summary with SUM, MEAN, MIN, MAX |
30.92 | 82.88 | 2.54 |
SAS Visual Analytics Report** with 4 report objects |
31.57 | 87.95 | 9.70 |
* Statistics: Minimum, Maximum, N, Sum, Mean, Std Dev, Std Error, Variance, Coeff of Variation, Corrected SS, USS, t Value, Pr > |t|, N Miss, Skewness, Kurtosis
** The test report is composed of 4 report objects computing different measures on different category variables. It has not been further optimized.
Next steps
In the near future, even more computations will be natively pushed down to SingleStore, fully harnessing its high-performance infrastructure and in-memory processing capabilities. By offloading aggregation and analytics directly to SingleStore, SAS Viya can significantly accelerate query execution, reduce data movement, and deliver near-instant insights on massive datasets.
Thanks for reading!
Find more articles from SAS Global Enablement and Learning here.