There are quite a few notable data management related enhancements in CAS for Viya 3.3. One such improvement is the new CAS table index functionality. Let's look at it here so we can better understand how and when to utilize it.
.
Currently CAS indexes can only be created via CAS Actions. Any action that can create a CAS table (includes a CASOUT option) can also index that table. As an example, the loadTable action is shown below.
Additionally there is a dedicated Index action within the Table action set.
NB:
|
In this release, neither CASUTIL nor SPRE (DATA Step, Procedures, etc.) support CAS index creation.
So, client-side sources will have to be loaded and then indexed in a two step process. |
.
In the 3.3 release of Viya, CAS indexes are only used by subsetting WHERE filters. Below shows the same WHERE filter applied to the same 9 million row CAS table with and without indexes. Note that the indexed version is an order of magnitude faster than the un-indexed version.
While not classic B-Tree indexes, CAS indexes are similarly best used on high to medium cardinality fields (fields with lots of unique values). There should always be a measureable reduction in CPU time when using a WHERE filter to subset on such indexed CAS Table column(s). Such CPU time reductions when using WHERE on very large CAS Tables are very important when multiple CAS Sessions are each running resource-intensive CAS Actions. Measures of Elapsed Time reduction depend on one or more the following:
.
NB:.... | In Viya 3.3, CAS indexes are not used in join optimization, for BY-Group processing, or by other process algorithms (e.g Transpose, Machine Learning). They are only used in sub-setting operations. So, they should only be applied when sub-setting is expected. For BY-Group optimization, utilize CAS partitioning and ordering instead of CAS indexes. |
.
Unlike other implementations, CAS indexes are stored inside their respective tables. There are no separate index files (e.g. no .sashbndx files). While it's not easy to see how CAS tables physically manifest, if you did go looking for the indexes, you would just see more data blocks. Similarly, when saved to SASHDAT, indexes manifest within the SASHDAT table. So again, there will be no separate index files.
.
Since we expect to see more data blocks, it should be no surprise that CAS indexes increase the overall size of CAS tables. As with any index implementation, CAS indexes improve data retrieval speed at the expense of storage space.
The size of the indexes depends mainly on field size and cardinality (the number of unique values) as well as the number of rows in the table.
For reference, here are the tableDetails output from two different indexing strategies. The first shows a single index on a unique, numeric field on a 9 million row table. The second one shows a non-unique index on a character field on the same table. Indexing the unique field consumes about 117MB while indexing the non-unique field (48 unique values) only takes 49MB. While every situation is different, this should give you an idea of the order of magnitude expected.
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.