BookmarkSubscribeRSS Feed

SAS Viya Cloud Analytic Services treats SingleStore differently

Started ‎09-30-2022 by
Modified ‎11-23-2022 by
Views 1,285

SAS' new partnership with SingleStore is bringing capabilities with a data vendor that, in my opinion, haven't been attempted by SAS at this scale and depth before. CAS in particular will see most of the impact by gaining the ability to integrate SingleStore tables with benefits almost as if they're in a native SAS format.

 

And so beginning with the release of stable-2022.1.4, SAS Viya marks yet another significant change in how we define some key terminology and concepts with SAS software. In this case, we're focused on additional features which enable CAS to work directly with tables in SingleStore. And in particular, there are now parameters which used to be reserved for working with SAS tables in SASHDAT format in a PATH or DNFS caslib that can now be applied to SingleStore sources as well. But now, they'll mean something slightly different.

 

Getting Caught Up

 

There's a fair amount of ground to cover to have this conversation properly. Some key points I'd like to remind you of:

 

  • All data that CAS works with in-memory is in SASHDAT format - optimized for the kind of analytics activities that SAS performs - regardless of the original source
     
  • Data placed in CAS_DISK_CACHE is also in SASHDAT format - due to memory-mapping associated with that data in RAM - again, regardless of the original source

 

And for CAS Backing Store options up to SAS Viya 2022.1.4:

 

  • For non-SASHDAT sources, CAS defaults to using its cache as the backing store for in-memory data (to improve resiliency and availability of the data). And InPlacePreferred is not an option.
     
  • When working with source data that's already in unencrypted SASHDAT format via the PATH or DNFS caslibs, CAS defaults to InPlacePreferred (using the original source files as the backing store), but can be optionally configured to use its cache instead.

 

And if those don't sound familiar, then take a few minutes to learn more by reading:

 

 

Now in SAS Viya 2022.1.4 and later

 

The concepts listed above still apply as written, but now there's additional functionality available because we can specify what CAS should use as its backing store for data in SingleStore as well. Of course, data in SingleStore is not in SASHDAT format - it has its own table storage offerings. And because the data is represented differently in SingleStore, then the way CAS works with it as a backing store is different as well.  

 

When Backing Store is CAS_DISK_CACHE

 

This is the default behavior unless your system has been configured otherwise already.

 

rc_1-CAS-Any-BackingStore.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

And there's not any change here to discuss really as the behavior is the same as it has been historically. Data is loaded from the source into CAS where it's stored in RAM with a memory-map to local disk in the CAS_DISK_CACHE. Basically, this gives you a local-to-CAS copy of the file to work with, make alterations to, etc. If desired, changes can be saved back to the original source destination as directed.  

 

When Backing Store is InPlacePreferred

 

This is where the behavior of CAS working with a SingleStore table will be markedly different than when CAS works with a SASHDAT table via PATH or DNFS caslib. Because a table in SingleStore is not in SASHDAT format on locally-accessible disk, then CAS cannot memory-map to it. Therefore, CAS must transfer the data from SingleStore every time a CAS action on that data is performed.

 

One thing that's not different: when InPlacePreferred is the selected backing store for CAS, then CAS will perform a "lazy load" of the data from the source when first performing the load action.  

 

What's Happening?

 

Let's look at what happens when InPlacePreferred is specified as the backing store for CAS. For SASHDAT, a "lazy load" really means that CAS only defines a memory-map to the source SASHDAT table referenced by the PATH or DNFS caslib when you request the initial load. And that usually requires only a second or two (at most) to perform, often less.

 

rc_2-CAS-HDAT-BackingStore.png

 

Once the memory-map is established, then CAS relies on the OS to determine when to page data into RAM (or page out back to disk) - that's what a memory-map is for. So after a "lazy load", the first CAS action to run on the data references the memory map at which point the OS notices the data isn't actually in RAM yet, so then the OS transfers the requested data from disk to RAM. Afterwards, the data stays in RAM for repeated use until some point later if/when the OS determines that RAM space might need to be used by something newer. In essence, CAS relies on memory-mapping as a virtual memory scheme so that we can conveniently work with more data than physical RAM is available.

 

For SingleStore as a data source, we still have the idea of a "lazy load". But this time, there's no memory map. Instead, CAS merely notes how to query the data from SingleStore. Again, this only takes a second or two (at most) to perform, often much less.

 

rc_3-CAS-S2-BackingStore.png

 

And so after the "lazy load" of a SingleStore table (really just saving a query to use later), then every CAS action to run must allow for the data to transfer from SingleStore. There's no memory-map and the OS itself isn't involved. Of course, this isn't as effective as memory-mapping is as a virtual memory scheme, but it does yield some similar benefits (tables aren't kept in RAM when not being used) with tradeoffs (repeated queries).  

 

Why Choose the Backing Store?

 

There are several reasons why you might want to specify the backing store for CAS to rely on.

 

For SASHDAT from PATH and DNFS caslibs, early versions of SAS Viya didn't offer an option to specify the backing store, with the effective result similar to InPlacePreferred. But there were times when multiple CAS servers were referencing the same data on disk and updating it. This caused problems when one CAS server hosted elsewhere updated the file with the result that the other CAS server's memory-maps would be out-of-date. So offering the option to use the CAS_DISK_CACHE as the backing store eliminated that challenge.

 

Tables in SingleStore, on the other hand, might be updated frequently. And so forcing CAS to always query the latest data records for each action might be exactly what's needed… and worth the price of repeated transfers. For relatively static tables, direct CAS to use its cache as the backing store and then rely on its memory-mapping methodology to reduce voluminous data transfers as well as keep data resident in RAM longer for repeated actions.

 

In other words, the choice of backing store will depend on the topology of your data sources, the available hardware, business requirements, and user satisfaction. It's up to you to suss out the appropriate mix. SAS gives you options to make the best play for your unique needs.  

 

Just Getting Started

 

SAS and SingleStore have big plans. This isn't the place to lay out the roadmap of future features, but there are further integrations and improvements coming down the pike soon. SAS will offer increased functionality to access built-in SingleStore functions as well as introduce the SAS In-Database Embedded Process for SingleStore which together ensure that the analytics run as close to the data as possible.

 

We're expecting to see a lot more big news in the coming months as this partnership grows.  

 

For More Information

 

Refer to the following reference for more helpful information:

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎11-23-2022 12:01 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started