BookmarkSubscribeRSS Feed

SAS Viya with SingleStore: Near Real-Time Dashboarding Using SAS Visual Analytics

Started ‎02-10-2023 by
Modified ‎02-10-2023 by
Views 1,318

SAS Viya with SingleStore has been introduced 6 months ago. While a lot of work is still going on to make the integration between SAS Viya and SingleStore wider and richer, we have already observed key areas where this duo can bring a lot of value.

fb38abf1-764d-4f1e-8906-0b19dcea6989.gif

 

I wanted to highlight one use case about near real-time reporting/dashboarding in SAS Visual Analytics that leverages SingleStore ability to perform updates ultra fast.  

 

What do we want to achieve?

 

In this example, we want to continuously ingest data files that are dropped in real-time in a cloud object storage location (here ADLS), load/update a SingleStore table accordingly and see if SAS Visual Analytics can capture those updates seamlessly.

 

In the global picture, we will be using multiple nice features from both tools:

 

  • SingleStore ability to run pipelines which continuously load data as it arrives from external sources
  • SAS Viya / SingleStore ability to push parts of CAS actions down to SingleStore (filters, column selection, computed columns) and to stream results dynamically in CAS
  • SAS Visual Analytics ability to refresh its contents periodically

 

nir_post_82_02_adls2va.gif

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

 

 

The SingleStore Part

 

The SingleStore Table

 

CAS requires a specific SingleStore table structure so that CAS actions can operate directly on them without any data duplication. Basically, the SingleStore table needs an AUTO_INCREMENT column whose goal is to automatically generate a unique value for new rows.

 

Here is an example of how to define such a table:

 

CREATE TABLE IF NOT EXISTS prdsale_from_pl (
    actual double,
    predict double,
    country varchar(20),
    region varchar(20),
    division varchar(20),
    prodtype varchar(20),
    product varchar(20),
    quarter double,
    year double,
    month date,
    /* auto_increment column that will be useful for CAS later */ 
    rowId bigint(20) NOT NULL AUTO_INCREMENT,
    KEY rowId (rowId) USING CLUSTERED COLUMNSTORE,
    SHARD KEY ()
) AUTOSTATS_CARDINALITY_MODE=INCREMENTAL AUTOSTATS_HISTOGRAM_MODE=CREATE AUTOSTATS_SAMPLING=ON SQL_MODE='STRICT_ALL_TABLES' ;

 

Notice the definition of the rowId field, AUTO_INCREMENT column and part of the CLUSTERED COLUMNSTORE KEY.

 

The SingleStore Pipeline

 

A SingleStore pipeline is capable of reading external data sources continuously and loading them into a SingleStore table. Here we simply want to read CSV files that are created in real-time in an Azure Data Lake Storage (Gen2) container.

 

Here is a sample syntax:

 

/* Define the pipeline */
/* AGGREGATOR pipeline to support AUTO_INCREMENT */
CREATE AGGREGATOR PIPELINE IF NOT EXISTS prdsale_pl
AS LOAD DATA AZURE 'blobdata/data/prdsale/'
/* Replace the ADLS account name and key */
CREDENTIALS '{"account_name": "adls_storage_account_name", "account_key": "adls_storage_account_key"}'
INTO TABLE prdsale_from_pl
FIELDS TERMINATED BY ','
/* Specific notation to parse the date from the CSV file */
(actual,predict,country,region,division,prodtype,product,quarter,year,@month)
SET month = STR_TO_DATE(@month, '%d%b%Y') ;

 

Pipelines are extremely powerful, and you can take advantage of them to also transform the data before loading (like in the previous example where we are parsing the dates from the CSV files into real database dates). Creating a pipeline does not start it. Here is the command to start it:

 

/* Start the pipeline */
START PIPELINE IF NOT RUNNING prdsale_pl ;

 

Once it is started, existing files in the ADLS location start to be ingested by the pipeline and loaded into the SingleStore table in real-time.  

 

The SAS Viya Part

 

Load the SingleStore Table in CAS

 

Explicitly loading tables in CAS is necessary for SAS applications such as SAS Visual Analytics that work on global (“promoted”) CAS tables. While loading a SingleStore table in CAS is not different from any other file/table from a pure code standpoint, it is really different from a data perspective. When you “load” a SingleStore table in CAS using the SingleStore CASLIB and the default options, you actually don’t load the data into CAS, you just “declare” the table in CAS and create a reference. No data is moved between SingleStore and CAS. Data will stream from SingleStore to CAS as soon as a CAS action requests it.

 

In our case, we also want to mention to CAS that the SingleStore table has already an AUTO_INCREMENT column and that we trust it (createMultipassColumn="onLoad", multipassColumn="trust"). Thus, any CAS action will operate on this source table. No copy of this table will be done to satisfy CAS requirements (uniqueness of one column). This also means that any update made to the SingleStore source table will be caught by CAS which is exactly what we want.

 

/* Load S2 table in CAS - Trust existing AI - createMultipassColumn="onLoad" - Do Not Normalize */
/* Load the prdsale_from_pl table which has an auto_increment column */
proc casutil incaslib="s2" outcaslib="s2" ;
   droptable casdata="prdsale_from_pl" quiet ;
   load casdata="prdsale_from_pl" casout="prdsale_from_pl" dataSourceOptions=(
      createMultipassColumn="onLoad"
   ) promote ;
quit ;

 

The SAS Visual Analytics Part

 

The SAS Visual Analytics report built on the SingleStore table “loaded” in CAS needs 2 settings to take advantage of the CAS / SingleStore integration.

 

First, “auto-refresh” needs to be set. This can be done at the report object level, at the page level or at the report level. Here is an example of how to refresh the report object every 10 seconds:

 

nir_post_82_03_va_auto_refresh-775x1024.png

 

 

 

 

Don’t set the period too short to not overwhelm VA.

 

By default, VA triggers the refresh smartly, only if the underlying CAS table has been modified meanwhile. Otherwise, it relies on the cache. We need a trick to disable caching for a specific data source and to force VA to (re)-run the CAS actions against the SingleStore table.

 

This trick is to “Apply a data filter” at the data source level, with a time sensitivity expression of less than a minute. The following generic filter expression, which is always true, can be used:

 

'1JAN2000_0:0:0'dt < Now()

 

Here is where to set it:

 

nir_post_82_04_va_data_filter-1024x748.png

 

 

 

And how to write it:

 

 nir_post_82_05_va_data_filter_expression-1024x726.png

 

Now that we have all the pieces in place, we can open the report in view mode (“View Report”). The report object will get updated based on the SingleStore table. Indeed, CAS actions that compose that report run every 10 seconds, push filters, column selection and computed columns down to SingleStore and stream the data back in CAS. If the SingleStore table gets updated by the pipeline (because new CSV files are coming in), this will be reflected in CAS and in the report.

 

nir_post_82_06_dashboard.gif

 

We showed an example of SingleStore pipelines to illustrate the update of SingleStore data captured by CAS. But any process that updates a SingleStore table would also be captured by CAS assuming the requirements are satisfied.  

 

Thanks to my colleagues Kim Botha, James Weiss and David Zanter for their help on this topic. 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎02-10-2023 04:21 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags