You may have partitioned a CAS table to improve the overall performance of Viya application. Over time, if there is a need for incremental data append to partitioned CAS table, the task is not simple. In this post, I discuss the data append process for a partitioned CAS table.
A CAS table can be partitioned on a specific data column to improve the CAS action performance. A partitioned CAS table avoids the auto partitioning and data movement among CAS worker nodes at run time for the by-group operation.
The data append to a partitioned CAS table is not supported. The data can be appended only to a standard CAS table (non-partitioned). When data is appended to a partitioned CAS table, it generates the following error message:
75
76 data pubSPRE.DM_FACT_MEGA_CORP_GOLD( append=yes );
77 set pubSPRE.DM_FACT_MEGA_CORP_APND;
NOTE: The DATA step will run in multiple threads.
NOTE: Executing action 'dataStep.runBinary'.
ERROR: DATA step cannot append to table DM_FACT_MEGA_CORP_GOLD in caslib DM.
Append is not supported for partitioned tables.
ERROR: The action stopped due to errors.
The data can be appended to a partitioned CAS table using an intermediate staging table with the following steps. To append data to a partitioned CAS table you must have enough (thrice the size of the table) memory space (read CAS_DISK_CACHE) to accommodate two intermediate CAS table for a short period.
When creating a staging table, do not use the original name for the target table name. When the same original name is used for target staging table, there will be two CAS tables with the same name, one in sessions scope and another in the global scope. When drop table action is submitted for same name table, it will drop the session scope staged table rather than the global scope table. The promote action will fail as there is no session-scope table. The global table will be unaffected; no new data appended. So, avoid confusion and use a new name (e.g. _stg) for staging table and stay away from getting same name CAS table in session scope and global scope.
The following code example describes the process of data append to an in-memory partitioned global CAS table. The data step used for creating a staging table and data append runs in CAS as both source and target is a CAS table. Step 3 creates a new staging table with partitioned data since data partition CAS action does not support in-place data partition.
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=yes);
/* Note the number of rows in CAS table before data append */
proc cas ;
action table.tabledetails / name="DM_FACT_MEGA_CORP_GOLD" caslib="DM";
run ;
/* 1. Copy existing global CAS table into a session scope staging CAS table. */
libname pubSPRE cas caslib="DM";
data pubSPRE.DM_FACT_MEGA_CORP_STAGE;
set pubSPRE.DM_FACT_MEGA_CORP_GOLD;
run ;
/* 2. Incremental data append into staging table */
data pubSPRE.DM_FACT_MEGA_CORP_STAGE( append=yes );
set pubSPRE.DM_FACT_MEGA_CORP_APND;
run ;
/* 3. Create a new staging table with data partition from appended staging table */
proc cas ;
table.partition /
casout={name="DM_FACT_MEGA_CORP_STAGE_N", caslib="DM"},
table={name="DM_FACT_MEGA_CORP_STAGE", caslib="DM",
groupby={{name="FacilityCity"}},orderby={{name="date"}}};
quit ;
/* 4. Drop the existing global CAS table. */
/* 5. Promote the partitioned staging CAS table to global CAS table with same original name */
Proc casutil;
droptable casdata="DM_FACT_MEGA_CORP_GOLD" incaslib="DM" quiet;
promote incaslib="DM" outcaslib="DM" casdata="DM_FACT_MEGA_CORP_STAGE_N" casout="DM_FACT_MEGA_CORP_GOLD" ;
quit;
/* verify new CAS table have appended data with data partition. */
proc cas ;
action table.tabledetails / name="DM_FACT_MEGA_CORP_GOLD" caslib="DM";
run ;
proc cas;
table.tabledetails result=r / level='partition'
name="DM_FACT_MEGA_CORP_GOLD" caslib="DM" ;
saveresult r dataout=work.r ;
run ; quit ;
proc print data=work.r;
var node blocks rows DataSize 'Key(FacilityCity)'n;
run;
CAS mySession TERMINATE;
Log extract from code execution:
…………………..
………
75
76 /* 1. Copy existing global CAS table into a session scope staging CAS table. */
77 libname pubSPRE cas caslib="DM";
NOTE: Libref PUBSPRE was successfully assigned as follows:
Engine: CAS
Physical Name: 86b07d6f-2eab-7041-812d-fa73baed385a
78
79 data pubSPRE.DM_FACT_MEGA_CORP_STAGE;
80 set pubSPRE.DM_FACT_MEGA_CORP_GOLD;
81 run ;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: Executing action 'dataStep.runBinary'.
NOTE: There were 3722071 observations read from the table DM_FACT_MEGA_CORP_GOLD in caslib DM.
NOTE: The table DM_FACT_MEGA_CORP_STAGE in caslib DM has 3722071 observations and 39 variables.
NOTE: DATA statement used (Total process time):
real time 4.00 seconds
cpu time 0.25 seconds
82
83 /* 2. Incremental data append into staging table */
84 data pubSPRE.DM_FACT_MEGA_CORP_STAGE( append=yes );
85 set pubSPRE.DM_FACT_MEGA_CORP_APND;
86 run ;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: Executing action 'dataStep.runBinary'.
NOTE: There were 3722071 observations read from the table DM_FACT_MEGA_CORP_APND in caslib DM.
NOTE: The table DM_FACT_MEGA_CORP_STAGE in caslib DM has 7444142 observations and 39 variables.
NOTE: The APPEND operation for table DM_FACT_MEGA_CORP_STAGE in caslib DM added 3722071 observations.
NOTE: DATA statement used (Total process time):
real time 4.28 seconds
cpu time 0.21 seconds
87
88 /* 3. Create a new staging table with data partition from appended staging table */
89 proc cas ;
90 table.partition /
91 casout={name="DM_FACT_MEGA_CORP_STAGE_N", caslib="DM"},
92 table={name="DM_FACT_MEGA_CORP_STAGE", caslib="DM",
93 groupby={{name="FacilityCity"}},orderby={{name="date"}}};
94 quit ;
NOTE: Active Session now MYSESSION.
NOTE: Executing action 'table.partition'.
NOTE: Action 'table.partition' used (Total process time):
{caslib=DM,tableName=DM_FACT_MEGA_CORP_STAGE_N,rowsTransferred=2790769,shuffleWaitTime=8.6317923069,minShuffleWaitTime=0,
maxShuffleWaitTime=0.7270908356,averageShuffleWaitTime=0.000617748}
NOTE: PROCEDURE CAS used (Total process time):
real time 20.56 seconds
cpu time 0.77 seconds
……
………………….
Result extract from code execution:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.