Using SAS with SingleStore – Enhancing Performance with Aggregate Pushdown

1 Like

SAS with SingleStore is a combined solution where SAS's advanced analytics and AI capabilities seamlessly integrate with SingleStore's high-performance, cloud-native database, allowing users to perform complex data analysis directly on real-time data stored within SingleStore, resulting in faster insights and improved decision-making without the need for extensive data movement.

A new capability was recently introduced (Stable 2024.11) to enhance CAS integration with SingleStore: aggregate pushdown for the simple.summary action.

What is it?

The simple.summary CAS action generates descriptive statistics for numeric variables, grouped by category variables. It is widely used across various SAS applications, including SAS Visual Analytics.

With aggregate pushdown for the simple.summary action, more computations—specifically aggregation—are offloaded to SingleStore, resulting in:

Computations being performed closer to the data, leveraging SingleStore’s infrastructure.
A significant reduction in the amount of data streamed from SingleStore to CAS.

How does simple.summary work WITHOUT aggregate pushdown?

Below is the data processing flow when a simple.summary CAS action runs on a SingleStore table, without aggregate pushdown enabled.

A user triggers a CAS action, from SAS code, a SAS application, or another CAS client.

The data management part of the CAS action is pushed to SingleStore. Row filtering, variable selection and new calculated columns are computed in SingleStore.

This results in a subset of detailed data on SingleStore. Depending on the case, it can still be millions of observations.

This subset of data is streamed from SingleStore to CAS and the CAS action makes additional computations as needed.

When done, results are consolidated and sent back to the calling client.

Finally, the temporary subset in CAS is cleaned up.

How does simple.summary work WITH aggregate pushdown?

Here is the data processing flow when a simple.summary CAS action runs on a SingleStore table, with aggregate pushdown enabled.

A user triggers a CAS action, from SAS code, a SAS application, or another CAS client.

The data management part of the CAS action is pushed to SingleStore. Row filtering, variable selection, new calculated columns and now aggregation are computed in SingleStore.

This results in a subset of aggregated data on SingleStore containing a few records.

This subset of data is streamed from SingleStore to CAS and the CAS action makes final adjustments if needed.

When done, results are consolidated and sent back to the calling client.

Finally, the temporary subset in CAS is cleaned up.

How to trigger aggregate pushdown?

You can add the sql=true option to your simple.analytics CAS action code to activate aggregate pushdown. Here is an example:

/* Pushdown aggregation to SingleStore */
proc cas ;
   simple.summary /
      inputs={"actual","predict","diff"},
      subSet={"SUM"},
      sql=true,
      table={
         caslib="s2",
         name="bigprdsale_from_s2",
         groupBy={"country","product","division_alias"},
         where="country=""CANADA"""
         computedVars={{name="division_alias"},{name="diff"}},
         computedVarsProgram="division_alias=substr(division,1,3) ; diff=actual-predict ;"
      } ;
quit ;

When you run this code, you should observe the following message in your SAS log:

NOTE: The aggregation is being calculated in the database.

Of course, if you want to have a more global behavior or if you can’t customize the CAS code (for example, you’re using SAS Visual Analytics which is a big consumer of simple.analytics), you can enable aggregate pushdown at the CASLIB definition with the aggregatePushdown=true option, as shown below:

/* SingleStore caslib */
caslib s2 desc="My S2 Caslib"
   dataSource=(srctype='singlestore',
   host="svc-sas-singlestore-cluster-dml.gelenv.svc.cluster.local",
   port=3306,
   database="geldm",
   epdatabase="gelwork",
   user="myuser",
   pass="mypass",
   aggregatePushdown=true
) libref=cass2 global ;

All simple.summary aggregations will be pushed down for all SingleStore tables in this CASLIB, unless sql=false is set at the CAS action level.

Additional statistics

In order to handle kurtosis or skewness statistics, additional functions are needed on the SingleStore side. Indeed, you must download the open-aggregates collection of user-defined functions (UDFs) and store them in a SingleStore database.

The UDFs, and the instructions for storing them, are available in GitHub at https://github.com/sassoftware/open-aggregates.

If you request aggregate pushdown with kurtosis or skewness and don’t have the UDFs deployed, you will get this message in the log:

NOTE: The aggregation could not be calculated in the database.
      A SingleStore user-defined function required for the aggregation does not exist in the database.
      See SAS documentation about aggregate pushdown with SingleStore.

After deploying the functions in SingleStore, update your CASLIB definition to include the udfDatabase option, specifying the database where the UDFs are located:

caslib s2 desc="My SingleStore Caslib"
   dataSource=(srctype='singlestore',
      host="svc-sas-singlestore-cluster-dml.gelenv.svc.cluster.local",
      port=3306
      database="geldm",
      epDatabase="gelwork",
      udfDatabase="geludf",
      user="myuser",
      pass="mypass",
      aggregatePushdown=true
) libref=cass2 global ;

Then running a simple.summary with kurtosis and skewness should send the aggregation to SingleStore properly:

143  proc cas ;
144     simple.summary /
145        inputs={"actual","predict"},
146        subSet={"SUM","KURTOSIS","SKEWNESS"},
147        table={
148           caslib="s2",
149           name="bigprdsale_from_s2",
150           groupBy={"country","product"},
151           where="country=""CANADA"""
152        } ;
153  quit ;
NOTE: Active Session now MYSESSION.
NOTE: The aggregation is being calculated in the database.
NOTE: The SAS Embedded Process is being used for this operation.
NOTE: The PROCEDURE CAS printed pages 3-7.
NOTE: PROCEDURE CAS used (Total process time):
      real time           2.93 seconds
      cpu time            0.05 seconds

Performance examples

Let's explore some performance examples. First, we'll outline the environment used for these tests.

CAS Infrastructure	1 Controller Node 3 Worker Nodes
SingleStore Infrastructure	1 Master Aggregator Node 1 Child Aggregator Node 2 Leaf Nodes
SingleStore Table	75.8 M Rows 19 Columns

The table below presents key performance metrics, highlighting the importance of aggregate pushdown for accelerating insights from SingleStore data while eliminating the need for data replication and movement. The green cells show the best times per test.

Test / Run Time in seconds	Table loaded in CAS	SingleStore Table without Aggregate Pushdown	SingleStore Table with Aggregate Pushdown
Simple.summary with all *statistics (16)**	31.49	87.18	37.33
Simple summary with SUM, MEAN, MIN, MAX	30.92	82.88	2.54
SAS Visual Analytics Report with 4** report objects	31.57	87.95	9.70

* Statistics: Minimum, Maximum, N, Sum, Mean, Std Dev, Std Error, Variance, Coeff of Variation, Corrected SS, USS, t Value, Pr > |t|, N Miss, Skewness, Kurtosis

** The test report is composed of 4 report objects computing different measures on different category variables. It has not been further optimized.

Next steps

In the near future, even more computations will be natively pushed down to SingleStore, fully harnessing its high-performance infrastructure and in-memory processing capabilities. By offloading aggregation and analytics directly to SingleStore, SAS Viya can significantly accelerate query execution, reduce data movement, and deliver near-instant insights on massive datasets.

Thanks for reading!

Find more articles from SAS Global Enablement and Learning here.

SAS Communities Library