In a multinode CAS environment, you may have unbalanced CAS tables. CAS tables not evenly distributed across all the CAS worker nodes. When the CAS tables are not properly balanced, the SAS Viya environment underperforms.
In a multinode CAS environment, CAS tables get unbalanced due to the following reasons.
When Multinode Data Load process selects a skewed column to split the source data or uses a lesser number of CAS Worker node than total available CAS nodes, the CAS table ends up with uneven data distribution across all CAS Worker nodes. In this scenario, when a user submits a CAS action (through reports or UI) against these tables, CAS has to temporarily move some of the data blocks amongst CAS nodes to utilize all Worker nodes and achieve parallel processing. The data movement on the fly impacts the overall performance of the SAS Viya environment.
(Note: The above statement is valid for most of the Data Connector except CDE. CDE Data connector redistributes data amongst CAS Worker nodes.)
With high availability feature for CAS tables, in a multinode CAS environment when one or more CAS worker nodes goes down the CAS tables are still available to users. The high availability of the CAS table depends on the number_of_copies parameter used during CAS load. With default value 1, when one of the CAS Worker nodes goes down, the CAS tables are still available. To make CAS table more node fault tolerant, you need to consider a higher number for copies parameter. The size of the CAS table is directly proportional to the number_of_copies parameter.
When a repaired CAS Worker node becomes part of the CAS environment, it does not have any data blocks from existing in-memory CAS tables. However, CAS will utilize the re-instated CAS worker node for multithread processing. CAS does not auto distribute the existing CAS table to re-instated CAS nodes. CAS will temporarily move parts of CAS table data to re-instated CAS worker node for each CAS action to execute the multi-threaded process. This on-the-fly data movement amongst the CAS nodes will cause overall performance degradation to Viya environment.
To improve Viya performance, you need to manually re-write the CAS table to balance across all the CAS worker nodes. You can utilize the in-memory data to re-write the CAS table. There is no need to re-load CAS from the source environment. You can re-write a CAS table from in-memory data provided you have enough (twice the size of table) memory space (read CAS_DISK_CACHE) to accommodate the intermediate CAS table for a short period.
You can re-distribute an existing in-memory CAS table in a three-step process.
The three-step process is required as you cannot in-place overwrite a global CAS table. You can create a staging CAS table from a global CAS table by using data steps or table partition CAS action. The table partition CAS action without a variable name just re-writes the CAS table across all worker nodes.
When creating a staging table, do not to use the original name for the target table name. When the same original name used for target staging table, there will be two CAS tables with the same name - one in session scope and another in the global scope. When you submit a drop table action for same name table, it will drop the session scope staged table rather than the global scope table. The promote action will fail with an error stating "there is no session-scope table " to promote. Your global table will be unaffected, untouched, and not balanced. So, avoid confusion and use a new name (e.g. _stg) for staging table and stay away from getting the same name table in session scope and global scope.
The following code example describes the process of re-distributing an in-memory CAS table across all CAS worker nodes.
SAS is headed back to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team.
Interested in speaking? Content from our attendees is one of the reasons that makes SAS Innovate such a special event!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.