I’ve been waiting for this option for years! It is now possible to specify the number of replications (or copies) at a (more) global level. Yay!
Indeed, until now, by default, tables were loaded in CAS with an immutable default “copies” of 1 (allowing to survive 1 CAS node failure). Or you could customize the number of copies using code for each table load. There was no way to globally set a different value for copies.
Who has never coded this?
proc casutil incaslib="dm" outcaslib="dm" ;
load casdata="bigprdsale.parquet" casout="bigprdsale" replace copies=3 ;
quit ;
data casdm.class(copies=0) ;
set sashelp.class ;
run ;
proc cas ;
action table.loadTable /
path='film', caslib='dm_pgdvd',
casOut={name='film', caslib='dm_pgdvd', replace=true, replication=2};
quit ;
PS: COPIES and REPLICATION are synonyms. However, they cannot be used interchangeably.
Generally, SAS users/administrators want to:
Setting COPIES to 0 is always a good optimization practice for complex SAS code that processes data in CAS. You may just want the final table (a datamart table for instance) to be highly available, but not all the tables involved in the process.
The more copies you set, the slower it is to create/load data in CAS, the more CAS_DISK_CACHE space you need.
In SAS code, when you start a new CAS session, you can set a default table replication factor that will be used on all tables you will load/create using the session.
In the following code, I’m setting a CAS session option – defaulttablereplication – to 2 at the beginning of my CAS session. All new tables created in that session, session-scoped or global, will be created with that new default of 2 copies.
cas mysession sessopts=(defaulttablereplication=2) ;
%put defaulttablereplication: %sysfunc(getsessopt(mysession, defaulttablereplication)) ;
proc casutil incaslib="dm" outcaslib="dm" ;
list files ;
load casdata="prdsale.sas7bdat" casout="prdsale" replace ;
load casdata="bigprdsale.parquet" casout="bigprdsale" replace ;
load incaslib="dm_pgdvd" outcaslib="dm_pgdvd" casdata="film" casout="film" replace ;
list tables ;
quit ;
You can still override the default session value with a specific value in your loading step if you want:
cas mysession sessopts=(defaulttablereplication=2) ;
%put defaulttablereplication: %sysfunc(getsessopt(mysession, defaulttablereplication)) ;
proc casutil incaslib="dm" outcaslib="dm" ;
list files ;
load casdata="prdsale.sas7bdat" casout="prdsale" replace ;
load casdata="bigprdsale.parquet" casout="bigprdsale" replace copies=0 ;
load incaslib="dm_pgdvd" outcaslib="dm_pgdvd" casdata="film" casout="film" replace copies=1 ;
list tables ;
quit ;
libname casdm cas caslib="dm" ;
data casdm.class(copies=0) ;
set sashelp.class ;
run ;
If you have a long and complex SAS process based on the CAS engine, and you want it as fast as it can be, you can easily set the default copies to 0:
cas mysession sessopts=(defaulttablereplication=0) ;
%put defaulttablereplication: %sysfunc(getsessopt(mysession, defaulttablereplication)) ;
/* Load source tables */
proc casutil incaslib="dm" outcaslib="dm" ;
list files ;
load casdata="prdsale.sas7bdat" casout="prdsale" replace ;
load casdata="bigprdsale.parquet" casout="bigprdsale" replace ;
load incaslib="dm_pgdvd" outcaslib="dm_pgdvd" casdata="film" casout="film" replace ;
list tables ;
quit ;
libname casdm cas caslib="dm" ;
/* Data manipulations */
data casdm.class_def ;
set sashelp.class ;
run ;
/* many ETL steps */
/* ... */
/* many ETL steps */
/* ... */
/* Make the final table highly available (?) */
data casdm.final_table(copies=3 promote=yes) ;
set casdm.class_def ;
run ;
cas mysession terminate ;
They are still loaded with the immutable default “copies” of 1 (allowing to survive 1 CAS node failure).
SAS is working on providing an equivalent option at the CAS server level. I'll update this article when it is available.
In the meantime, if you want to load tables with a different value for copies or replication, you will have to load them using code.
Thanks for reading.
Find more articles from SAS Global Enablement and Learning here.
this is a great option, we are looking forward to the update how to set is up for the whole cas server. It will for sure save resources.
hello @NicolasRobert
I think this option for the whole cas server is available. Can you confirm it is the right one?
specifies the default replication factor for new tables for a session.
You can also edit the sas.cas.instance.config:config
instance to set the variable.
Hello @touwen_k
Yes it is the right one, but I believe it is not working well yet. The value is not taken into account.
I'll let you know when this is fixed.
Thanks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.