We all know that SAS Cloud Analytics Services (CAS) is an in-memory analytics engine. Ideally, all of the data will reside in RAM on the CAS host(s) which gives it the fastest possible route for processing in the CPU. To improve its enterprise flexibility however, CAS can employ disk-based storage for caching data as well. And that leads to benefits such as improved data availability and a more robust memory utilization scheme.
The space set aside for this purpose is CAS_DISK_CACHE. But the CAS_DISK_CACHE isn't always used when CAS is working with in-memory data. There are circumstances that determine when it's used and when it's not. Most of the time, CAS will use its cache in a way that benefits your process objectives. But occasionally, there's a need to direct CAS to behave differently.
Let's take a look at the situations where you might want to override CAS' default caching behavior and how to accomplish it.
The way that the SAS Viya Cloud Analytic Services (CAS) Server manages data in memory is a complicated topic. There are many twists and turns to cover all of the nuances and possibilities. However, most of that behavior can be distilled down to a few simple rules which adequately describe the majority of situations. See 4 Rules to Understand CAS Management of In-Memory Data for a fuller explanation, but for now, let's look at:
Rule № 2:
All CAS in-memory data is memory mapped to a locally-accessible backing store.
Very briefly, this means that the data loaded into CAS is typically backed with a memory-map to local disk storage - and usually, that's CAS_DISK_CACHE. But there's one notable distinction, if you're loading data into CAS which is already in unencrypted SASHDAT format on disk that appears local to the CAS host, then it memory-maps to the source (and not to cache).
To ensure robust data availability in spite of a hardware failure, CAS normally defines a backing store for all in-memory data. For PATH and DNFS types of caslib, CAS makes efficient use of the existing SASHDAT file as the backing store instead relying on its own cache location.
There's one use-case in particular where overriding CAS' default caching behavior can be helpful. When you have multiple, concurrent users of a PATH or DNFS sourced, unencrypted SASHDAT file - and one (or more) of those users need to make changes to the data. This kind of situation often becomes a classic computer science challenge referred to as a race condition. If you've ever tried to edit a file alongside other people on a shared disk, you've experienced a similar race condition yourself.
Let's say you and I both want to work with an unencrypted SASHDAT file in a DNFS caslib. And then we both start our own CAS sessions, load the table, and get to work.
By default, when the first user (you) runs the loadTable action, their CAS session will memory-map to the SASHDAT file's source (not CAS_DISK_CACHE). Then when the second user (me) runs the loadTable action, the second CAS session will use the same memory-map handles. This makes CAS very efficient - it only effectively loads the table into RAM once.
But now imagine that you make changes to the in-memory table, save them back to source, and quit your session. And then I make some different set of changes and write those back to the same source. I'll be the jerk who overwrote your changes and you won't know until later (that's the race condition).
The reality is slightly more gritty than this, though. Turns out in real life that I wouldn't get a chance to save my changes. After you pushed your changes to the source, that then effectively adds (and/or deletes) some of the memory maps CAS was using. Your session knows about those changes (of course), but my CAS session doesn't. And we've seen that when this happens, it can cause my and other CAS sessions to hang or otherwise become unresponsive. That's not the kind of experience we want to provide to our users.
Normally if you have a table in CAS for multiple people to see, it should be promoted to the global caslib. Updates can be made and the other consumers will see those as well. But in the scenario I'm describing here, we've got multiple users attempting to update the table at its source.
You should protect your PATH and DNFS sources of unencrypted SASHDAT to only have a single writer and for best results, keep it as read-only for general use. But if you cannot for some reason, then direct CAS to always rely on its cache as the backing store for unencrypted SASDHAT from PATH or DNFS caslibs.
Let's return to our scenario where you and I both have CAS sessions which loaded the same table from source into memory and this time, we've directed CAS to use its cache (not memory mapping to source).
This now means that we both have our own instance of the table in memory. Your instance is backed to the cache with its own memory maps. And so is mine. When you make changes to your in-memory table, it has no effect on mine. That's great! And better, eliminates that nasty hanging of CAS sessions.
But to be clear, we haven't really eliminated the race condition if we both intend to save our final changes back to the same source. To do that, we must coordinate with each other and employ other common data management strategies to be good stewards of the data and system.
Specific to loading SASHDAT data using either PATH or DNFS type of caslibs, we are able to override CAS' default behavior with its cache through the use of three parameters.
Those parameters rely on two values in particular:
The three parameters where these values are used employ a hierarchical relationship which you can override depending on the level of specificity desired: entire CAS server > per CAS library > each CAS table.
To globally set the CAS server's behavior, modify the casconfig_usermods.lua file on the CAS Controller host in directory /opt/sas/viya/config/etc/cas/default:
If not specified, then the default is INPLACEPREFERRED.
For reference, see Controlling Use of CAS_DISK_CACHE in the SAS® Viya® 3.5: System Programming Guide.
When adding a caslib with a srcType of either PATH or DNFS, provide this source-specific option:
If not specified, then the default is INHERIT which means this caslib will follow the global setting indicated by env.CAS_LOADBACKINGSTORE above.
Here's an example:
caslib mydata sessref=mysess datasource=(srctype="DNFS",
For reference, see addCaslib Action in the SAS® Viya® 3.5: System Programming Guide.
When loading an unencrypted SASHDAT table in a PATH or DNFS type of caslib, provide this importOption:
If not specified, then the default is INHERIT which means this table load will follow the caslib setting indicated by the loadTableBackingStore caslib option above.
Here's an example:
proc casutil; load casdata="ImportantData.sashdat" importOptions=(filetype="HDAT", backingStore="CASDISKCACHE") casout="ImportantData" ; run;
For reference, see Common Parameter: importOptions in the SAS® Viya® 3.5: System Programming Guide.
There are a few other considerations to deal with.
There are a couple of exceptions where the value of the backingStore parameter has no effect when a SASHDAT file is loaded:
When the Apache Hadoop Distributed File System is installed symmetrically alongside CAS, then unencrypted SASHDAT is locally accessible and so CAS will memory-map to the source blocks directly. It will not use its cache.
As a matter of fact, specifying the backingStore option for any caslib type other than PATH or DNFS will be silently ignored (i.e. no syntax errors in the log).
When CAS memory-maps to SASHDAT directly at the source, then COPIES=0 (i.e. no replicated blocks for failover are placed in the CAS cache). However, when you specify the CAS cache as the backing store, then the usual default for external data source COPIES=1 will go into effect. This means CAS maps in-memory data to the cache *and* also places duplicate copies of the blocks in cache as well to protect against the unexpected loss of a worker.
Designing your system and processes for multi-user concurrent access to tables which change requires planning, effort, and communication. CAS offers several options for dealing with this in how it operates, the procedures for working with tables, and configuration parameters.
Thanks to Andy Bouts, Principal Pre-Sales Solutions Architect for sharing his hard-won experience and insights with these use-cases in the field.
SAS is headed back to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team.
Interested in speaking? Content from our attendees is one of the reasons that makes SAS Innovate such a special event!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.