BookmarkSubscribeRSS Feed

SAS Viya with SingleStore: Data Flow Concept

Started ‎02-28-2023 by
Modified ‎02-28-2023 by
Views 1,236

I’ve written about SAS Viya with SingleStore and showed how we can leverage data updates from SingleStore in near real-time in SAS Visual Analytics. In the coming posts, we will dive in a little bit deeper and understand some of the concepts behind the scenes. First let’s see how data flows from SingleStore to CAS.

 

1 - Data is NOT loaded in CAS initially

 

In order to process data in SAS Viya, we usually have to load it into CAS first. This is true for most of SAS visual applications (like SAS Visual Analytics) which work on global tables. This is also true with code (if we put the “transient scope” aside for more simplicity). This means we create a copy of the data into CAS.

 

With SAS Viya with SingleStore, the SingleStore table is actually not loaded in CAS at load time (when we load it through the UI or with a loadTable CAS action). A few metadata are grabbed from SingleStore (table structure, columns, number of rows, etc.) and the table is then “declared” in CAS. No copy of data is done in CAS and no data movement occurs.

 

This is depicted in the following figure:

 

nir_post_83_01_data_motion.gif

 

The above mechanism relies on the default value of the backingStore option which is INPLACEPREFERRED. If you set this option to CASDISKCACHE, you will replicate the behavior of traditional CASLIBs (duplicate the data in CAS at load).  

 

2 - A CAS action triggers some pre-processing in SingleStore

 

In “traditional” CAS, CAS actions run on CAS data (the copy of the original table loaded in CAS) in the CAS server.

 

nir_post_83_02_trad_cas_action.gif

 

With SAS Viya with SingleStore, the CAS action processing will be distributed between SingleStore and CAS. The goal is to reduce data movement between SingleStore and CAS by preparing a subset of the data in SingleStore. As of today, SingleStore handles:

 

  • Row filtering
  • Variable selection
  • New calculated columns

nir_post_83_03_s2_cas_action.gif

 

Technically, this is handled by the SAS Embedded Process deployed in the SingleStore cluster.  

 

3 - Data is streamed to CAS on the fly

 

Then, with SAS Viya with SingleStore, the result set of the previous step is streamed dynamically from SingleStore to CAS for final computations.

 

nir_post_83_04_s2_stream.gif

 

4 - The final computations happen in CAS

 

Finally, the CAS action processing is much more than just a selection of fields and records and computation of additional columns. It is about simple or complex analytical processing that is performed in CAS on the data that has just been streamed in. The final results are sent to the calling client.

 

nir_post_83_05_s2_post.gif

 

  

The entire flow description is also available in this video:

 

 

Thanks for reading.

 

Version history
Last update:
‎02-28-2023 02:26 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started