There are quite a few notable data management related enhancements in CAS for Viya 3.3. One such improvement is the new CAS table index functionality. Let's look at it here so we can better understand how and when to utilize it.
.
Currently CAS indexes can only be created via CAS Actions. Any action that can create a CAS table (includes a CASOUT option) can also index that table. As an example, the loadTable action is shown below.
Additionally there is a dedicated Index action within the Table action set.
NB:
|
In this release, neither CASUTIL nor SPRE (DATA Step, Procedures, etc.) support CAS index creation.
So, client-side sources will have to be loaded and then indexed in a two step process. |
.
In the 3.3 release of Viya, CAS indexes are only used by subsetting WHERE filters. Below shows the same WHERE filter applied to the same 9 million row CAS table with and without indexes. Note that the indexed version is an order of magnitude faster than the un-indexed version.
While not classic B-Tree indexes, CAS indexes are similarly best used on high to medium cardinality fields (fields with lots of unique values). There should always be a measureable reduction in CPU time when using a WHERE filter to subset on such indexed CAS Table column(s). Such CPU time reductions when using WHERE on very large CAS Tables are very important when multiple CAS Sessions are each running resource-intensive CAS Actions. Measures of Elapsed Time reduction depend on one or more the following:
.
NB:.... | In Viya 3.3, CAS indexes are not used in join optimization, for BY-Group processing, or by other process algorithms (e.g Transpose, Machine Learning). They are only used in sub-setting operations. So, they should only be applied when sub-setting is expected. For BY-Group optimization, utilize CAS partitioning and ordering instead of CAS indexes. |
.
Unlike other implementations, CAS indexes are stored inside their respective tables. There are no separate index files (e.g. no .sashbndx files). While it's not easy to see how CAS tables physically manifest, if you did go looking for the indexes, you would just see more data blocks. Similarly, when saved to SASHDAT, indexes manifest within the SASHDAT table. So again, there will be no separate index files.
.
Since we expect to see more data blocks, it should be no surprise that CAS indexes increase the overall size of CAS tables. As with any index implementation, CAS indexes improve data retrieval speed at the expense of storage space.
The size of the indexes depends mainly on field size and cardinality (the number of unique values) as well as the number of rows in the table.
For reference, here are the tableDetails output from two different indexing strategies. The first shows a single index on a unique, numeric field on a 9 million row table. The second one shows a non-unique index on a character field on the same table. Indexing the unique field consumes about 117MB while indexing the non-unique field (48 unique values) only takes 49MB. While every situation is different, this should give you an idea of the order of magnitude expected.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.