SAS Communities Library

We’re smarter together. Learn from this collection of community knowledge and add your expertise.
BookmarkSubscribeRSS Feed

Using SAS with SingleStore – Enhancing Performance with Aggregate Pushdown

Started ‎03-10-2025 by
Modified ‎03-10-2025 by
Views 327

SAS with SingleStore is a combined solution where SAS's advanced analytics and AI capabilities seamlessly integrate with SingleStore's high-performance, cloud-native database, allowing users to perform complex data analysis directly on real-time data stored within SingleStore, resulting in faster insights and improved decision-making without the need for extensive data movement.

 

A new capability was recently introduced (Stable 2024.11) to enhance CAS integration with SingleStore: aggregate pushdown for the simple.summary action.

 

What is it?

 

The simple.summary CAS action generates descriptive statistics for numeric variables, grouped by category variables. It is widely used across various SAS applications, including SAS Visual Analytics.

 

With aggregate pushdown for the simple.summary action, more computations—specifically aggregation—are offloaded to SingleStore, resulting in:

 

  1. Computations being performed closer to the data, leveraging SingleStore’s infrastructure.
  2. A significant reduction in the amount of data streamed from SingleStore to CAS.

 

How does simple.summary work WITHOUT aggregate pushdown?

 

Below is the data processing flow when a simple.summary CAS action runs on a SingleStore table, without aggregate pushdown enabled.

 

01_nir_post_101_01_flow_no_agg_pushown.gif

 

nir_post_90_02_step_1.pngA user triggers a CAS action, from SAS code, a SAS application, or another CAS client.

 

nir_post_90_03_step_2.pngThe data management part of the CAS action is pushed to SingleStore. Row filtering, variable selection and new calculated columns are computed in SingleStore.

 

nir_post_90_04_step_3.pngThis results in a subset of detailed data on SingleStore. Depending on the case, it can still be millions of observations.

 

nir_post_90_05_step_4.pngThis subset of data is streamed from SingleStore to CAS and the CAS action makes additional computations as needed.

 

nir_post_90_06_step_5.pngWhen done, results are consolidated and sent back to the calling client.

 

nir_post_90_07_step_6.pngFinally, the temporary subset in CAS is cleaned up.

 

 

How does simple.summary work WITH aggregate pushdown?

 

Here is the data processing flow when a simple.summary CAS action runs on a SingleStore table, with aggregate pushdown enabled.

 

02_nir_post_101_02_flow_with_agg_pushown.gif

 

nir_post_90_02_step_1.pngA user triggers a CAS action, from SAS code, a SAS application, or another CAS client.

 

nir_post_90_03_step_2.pngThe data management part of the CAS action is pushed to SingleStore. Row filtering, variable selection, new calculated columns and now aggregation are computed in SingleStore.

 

nir_post_90_04_step_3.pngThis results in a subset of aggregated data on SingleStore containing a few records.

 

nir_post_90_05_step_4.pngThis subset of data is streamed from SingleStore to CAS and the CAS action makes final adjustments if needed.

 

nir_post_90_06_step_5.pngWhen done, results are consolidated and sent back to the calling client.

 

nir_post_90_07_step_6.pngFinally, the temporary subset in CAS is cleaned up.

 

 

How to trigger aggregate pushdown?

 

You can add the sql=true option to your simple.analytics CAS action code to activate aggregate pushdown. Here is an example:

 

/* Pushdown aggregation to SingleStore */
proc cas ;
   simple.summary /
      inputs={"actual","predict","diff"},
      subSet={"SUM"},
      sql=true,
      table={
         caslib="s2",
         name="bigprdsale_from_s2",
         groupBy={"country","product","division_alias"},
         where="country=""CANADA"""
         computedVars={{name="division_alias"},{name="diff"}},
         computedVarsProgram="division_alias=substr(division,1,3) ; diff=actual-predict ;"
      } ;
quit ;

 

When you run this code, you should observe the following message in your SAS log:

 

NOTE: The aggregation is being calculated in the database.

 

Of course, if you want to have a more global behavior or if you can’t customize the CAS code (for example, you’re using SAS Visual Analytics which is a big consumer of simple.analytics), you can enable aggregate pushdown at the CASLIB definition with the aggregatePushdown=true option, as shown below:

 

/* SingleStore caslib */
caslib s2 desc="My S2 Caslib"
   dataSource=(srctype='singlestore',
   host="svc-sas-singlestore-cluster-dml.gelenv.svc.cluster.local",
   port=3306,
   database="geldm",
   epdatabase="gelwork",
   user="myuser",
   pass="mypass",
   aggregatePushdown=true
) libref=cass2 global ;

 

All simple.summary aggregations will be pushed down for all SingleStore tables in this CASLIB, unless sql=false is set at the CAS action level.

 

Additional statistics

 

In order to handle kurtosis or skewness statistics, additional functions are needed on the SingleStore side. Indeed, you must download the open-aggregates collection of user-defined functions (UDFs) and store them in a SingleStore database.

 

The UDFs, and the instructions for storing them, are available in GitHub at https://github.com/sassoftware/open-aggregates.

 

If you request aggregate pushdown with kurtosis or skewness and don’t have the UDFs deployed, you will get this message in the log:

 

NOTE: The aggregation could not be calculated in the database.
      A SingleStore user-defined function required for the aggregation does not exist in the database.
      See SAS documentation about aggregate pushdown with SingleStore.

 

After deploying the functions in SingleStore, update your CASLIB definition to include the udfDatabase option, specifying the database where the UDFs are located:

 

caslib s2 desc="My SingleStore Caslib"
   dataSource=(srctype='singlestore',
      host="svc-sas-singlestore-cluster-dml.gelenv.svc.cluster.local",
      port=3306
      database="geldm",
      epDatabase="gelwork",
      udfDatabase="geludf",
      user="myuser",
      pass="mypass",
      aggregatePushdown=true
) libref=cass2 global ;

 

Then running a simple.summary with kurtosis and skewness should send the aggregation to SingleStore properly:

 

143  proc cas ;
144     simple.summary /
145        inputs={"actual","predict"},
146        subSet={"SUM","KURTOSIS","SKEWNESS"},
147        table={
148           caslib="s2",
149           name="bigprdsale_from_s2",
150           groupBy={"country","product"},
151           where="country=""CANADA"""
152        } ;
153  quit ;
NOTE: Active Session now MYSESSION.
NOTE: The aggregation is being calculated in the database.
NOTE: The SAS Embedded Process is being used for this operation.
NOTE: The PROCEDURE CAS printed pages 3-7.
NOTE: PROCEDURE CAS used (Total process time):
      real time           2.93 seconds
      cpu time            0.05 seconds

 

Performance examples

 

Let's explore some performance examples. First, we'll outline the environment used for these tests.

 

CAS Infrastructure 1 Controller Node
3 Worker Nodes
SingleStore Infrastructure 1 Master Aggregator Node
1 Child Aggregator Node
2 Leaf Nodes
SingleStore Table 75.8 M Rows
19 Columns

 

The table below presents key performance metrics, highlighting the importance of aggregate pushdown for accelerating insights from SingleStore data while eliminating the need for data replication and movement. The green cells show the best times per test.

 

Test / Run Time in seconds

Table loaded

in CAS

SingleStore Table without 

Aggregate Pushdown

SingleStore Table with

Aggregate Pushdown

Simple.summary with all

statistics* (16)

31.49 87.18 37.33

Simple summary with

SUM, MEAN, MIN, MAX

30.92 82.88 2.54

SAS Visual Analytics Report** with 4 report objects

31.57 87.95 9.70

 

* Statistics: Minimum, Maximum, N, Sum, Mean, Std Dev, Std Error, Variance, Coeff of Variation, Corrected SS, USS, t Value, Pr > |t|, N Miss, Skewness, Kurtosis

 

** The test report is composed of 4 report objects computing different measures on different category variables. It has not been further optimized.

 

Next steps

 

In the near future, even more computations will be natively pushed down to SingleStore, fully harnessing its high-performance infrastructure and in-memory processing capabilities. By offloading aggregation and analytics directly to SingleStore, SAS Viya can significantly accelerate query execution, reduce data movement, and deliver near-instant insights on massive datasets.

 

Thanks for reading!

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎03-10-2025 01:35 PM
Updated by:

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started