BookmarkSubscribeRSS Feed
burkheart14
Calcite | Level 5

If one table is partitioned in CAS then the rows corresponding for the entire partition is stored on one node (assuming a CAS MPP Hadoop Co-located deployment). According to SAS Viya Administration will try (if possible) to co-locate the CAS operations on the node that holds the data:

For a distributed server, the partition is on a single machine.... If you use BY-group processing in a DATA step with the same variables, then it is a performance advantage to partition the table when you load the table into memory on the server.

 

My question is: if two tables A and B are partitioned by the same partition criteria then are the corresponding partitions for A and B co-located one the same node so that a join (utilizing the partition criteria) between these tables does have to move the partitions between nodes?

 

Example: Join two tables A and B partitioned by DATE_KEY

Table A columns: TRANSACTION_KEY (PK), DATE_KEY (PARTITION CRITERIA), ACCOUNT_KEY

Table B columns: TRANSACTION_KEY (PK), DATE_KEY (PARTITION CRITERIA), TRANSACTION_DESCRIPTION

 

data mycas.C;
  merge mycas.A mycas.B;
  by date_key transaction_key;
run;

 

 

Also, is there any SAS option that allows to see how CAS is executing this data step under the hood? I.e. are there any data shuffles between nodes or implicit sort operations. In the above example there should be an implicit sort on transaction_key before two the corresponding partitions for A and B can be merged on a given DATE_KEY.

 

1 REPLY 1
joeFurbee
Community Manager

Moving the post to the About SAS Viya Community as the Coding on SAS Viya Community is being sunset.


Join us for SAS Community Trivia
SAS Bowl XLIII, The New SAS Developer Portal
Wednesday, August 14, 2024, at 10 a.m. ET | #SASBowl