SAS' new partnership with SingleStore is bringing capabilities with a data vendor that, in my opinion, haven't been attempted by SAS at this scale and depth before. CAS in particular will see most of the impact by gaining the ability to integrate SingleStore tables with benefits almost as if they're in a native SAS format.
And so beginning with the release of stable-2022.1.4, SAS Viya marks yet another significant change in how we define some key terminology and concepts with SAS software. In this case, we're focused on additional features which enable CAS to work directly with tables in SingleStore. And in particular, there are now parameters which used to be reserved for working with SAS tables in SASHDAT format in a PATH or DNFS caslib that can now be applied to SingleStore sources as well. But now, they'll mean something slightly different.
There's a fair amount of ground to cover to have this conversation properly. Some key points I'd like to remind you of:
And for CAS Backing Store options up to SAS Viya 2022.1.4:
And if those don't sound familiar, then take a few minutes to learn more by reading:
The concepts listed above still apply as written, but now there's additional functionality available because we can specify what CAS should use as its backing store for data in SingleStore as well. Of course, data in SingleStore is not in SASHDAT format - it has its own table storage offerings. And because the data is represented differently in SingleStore, then the way CAS works with it as a backing store is different as well.
This is the default behavior unless your system has been configured otherwise already.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
And there's not any change here to discuss really as the behavior is the same as it has been historically. Data is loaded from the source into CAS where it's stored in RAM with a memory-map to local disk in the CAS_DISK_CACHE. Basically, this gives you a local-to-CAS copy of the file to work with, make alterations to, etc. If desired, changes can be saved back to the original source destination as directed.
This is where the behavior of CAS working with a SingleStore table will be markedly different than when CAS works with a SASHDAT table via PATH or DNFS caslib. Because a table in SingleStore is not in SASHDAT format on locally-accessible disk, then CAS cannot memory-map to it. Therefore, CAS must transfer the data from SingleStore every time a CAS action on that data is performed.
One thing that's not different: when InPlacePreferred is the selected backing store for CAS, then CAS will perform a "lazy load" of the data from the source when first performing the load action.
Let's look at what happens when InPlacePreferred is specified as the backing store for CAS. For SASHDAT, a "lazy load" really means that CAS only defines a memory-map to the source SASHDAT table referenced by the PATH or DNFS caslib when you request the initial load. And that usually requires only a second or two (at most) to perform, often less.
Once the memory-map is established, then CAS relies on the OS to determine when to page data into RAM (or page out back to disk) - that's what a memory-map is for. So after a "lazy load", the first CAS action to run on the data references the memory map at which point the OS notices the data isn't actually in RAM yet, so then the OS transfers the requested data from disk to RAM. Afterwards, the data stays in RAM for repeated use until some point later if/when the OS determines that RAM space might need to be used by something newer. In essence, CAS relies on memory-mapping as a virtual memory scheme so that we can conveniently work with more data than physical RAM is available.
For SingleStore as a data source, we still have the idea of a "lazy load". But this time, there's no memory map. Instead, CAS merely notes how to query the data from SingleStore. Again, this only takes a second or two (at most) to perform, often much less.
And so after the "lazy load" of a SingleStore table (really just saving a query to use later), then every CAS action to run must allow for the data to transfer from SingleStore. There's no memory-map and the OS itself isn't involved. Of course, this isn't as effective as memory-mapping is as a virtual memory scheme, but it does yield some similar benefits (tables aren't kept in RAM when not being used) with tradeoffs (repeated queries).
There are several reasons why you might want to specify the backing store for CAS to rely on.
For SASHDAT from PATH and DNFS caslibs, early versions of SAS Viya didn't offer an option to specify the backing store, with the effective result similar to InPlacePreferred. But there were times when multiple CAS servers were referencing the same data on disk and updating it. This caused problems when one CAS server hosted elsewhere updated the file with the result that the other CAS server's memory-maps would be out-of-date. So offering the option to use the CAS_DISK_CACHE as the backing store eliminated that challenge.
Tables in SingleStore, on the other hand, might be updated frequently. And so forcing CAS to always query the latest data records for each action might be exactly what's needed… and worth the price of repeated transfers. For relatively static tables, direct CAS to use its cache as the backing store and then rely on its memory-mapping methodology to reduce voluminous data transfers as well as keep data resident in RAM longer for repeated actions.
In other words, the choice of backing store will depend on the topology of your data sources, the available hardware, business requirements, and user satisfaction. It's up to you to suss out the appropriate mix. SAS gives you options to make the best play for your unique needs.
SAS and SingleStore have big plans. This isn't the place to lay out the roadmap of future features, but there are further integrations and improvements coming down the pike soon. SAS will offer increased functionality to access built-in SingleStore functions as well as introduce the SAS In-Database Embedded Process for SingleStore which together ensure that the analytics run as close to the data as possible.
We're expecting to see a lot more big news in the coming months as this partnership grows.
Refer to the following reference for more helpful information:
Find more articles from SAS Global Enablement and Learning here.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.