BookmarkSubscribeRSS Feed

A New Option to Manage CAS Table Replications (or Copies)

Started ‎11-06-2023 by
Modified ‎11-06-2023 by
Views 593

I’ve been waiting for this option for years! It is now possible to specify the number of replications (or copies) at a (more) global level. Yay!

 

Indeed, until now, by default, tables were loaded in CAS with an immutable default “copies” of 1 (allowing to survive 1 CAS node failure). Or you could customize the number of copies using code for each table load. There was no way to globally set a different value for copies.

 

Who has never coded this?

 

proc casutil incaslib="dm" outcaslib="dm" ;
	load casdata="bigprdsale.parquet" casout="bigprdsale" replace copies=3 ;
quit ;

data casdm.class(copies=0) ;
	set sashelp.class ;
run ;

proc cas ;
	action table.loadTable / 
		path='film', caslib='dm_pgdvd', 
		casOut={name='film', caslib='dm_pgdvd', replace=true,  replication=2};
quit ;

 

PS: COPIES and REPLICATION are synonyms. However, they cannot be used interchangeably.

 

Generally, SAS users/administrators want to:

 

  • Use a value greater than 1 for large CAS clusters. Indeed, if you work with a 20-nodes CAS cluster, you may want to survive more than 1 CAS node failure.
  • Use a value of 0 copies to optimize your SAS ETL processes when you don’t really care about high availability. And very often, you don’t care about high availability of your numerous intermediate CAS tables that you create during a data flow.

 

Setting COPIES to 0 is always a good optimization practice for complex SAS code that processes data in CAS. You may just want the final table (a datamart table for instance) to be highly available, but not all the tables involved in the process.

 

The more copies you set, the slower it is to create/load data in CAS, the more CAS_DISK_CACHE space you need.

 

So, what is possible now in SAS Viya 2023.09?

 

In SAS code, when you start a new CAS session, you can set a default table replication factor that will be used on all tables you will load/create using the session.

 

In the following code, I’m setting a CAS session option – defaulttablereplication – to 2 at the beginning of my CAS session. All new tables created in that session, session-scoped or global, will be created with that new default of 2 copies.

 

cas mysession sessopts=(defaulttablereplication=2) ;
%put defaulttablereplication: %sysfunc(getsessopt(mysession, defaulttablereplication)) ;

proc casutil incaslib="dm" outcaslib="dm" ;
   list files ;
   load casdata="prdsale.sas7bdat" casout="prdsale" replace ;
   load casdata="bigprdsale.parquet" casout="bigprdsale" replace ;
   load incaslib="dm_pgdvd" outcaslib="dm_pgdvd" casdata="film" casout="film" replace ;
   list tables ;
quit ;

 

You can still override the default session value with a specific value in your loading step if you want:

 

cas mysession sessopts=(defaulttablereplication=2) ;
%put defaulttablereplication: %sysfunc(getsessopt(mysession, defaulttablereplication)) ;

proc casutil incaslib="dm" outcaslib="dm" ;
   list files ;
   load casdata="prdsale.sas7bdat" casout="prdsale" replace ;
   load casdata="bigprdsale.parquet" casout="bigprdsale" replace copies=0 ;
   load incaslib="dm_pgdvd" outcaslib="dm_pgdvd" casdata="film" casout="film" replace copies=1 ;
   list tables ;
quit ;

libname casdm cas caslib="dm" ;

data casdm.class(copies=0) ;
   set sashelp.class ;
run ;

 

If you have a long and complex SAS process based on the CAS engine, and you want it as fast as it can be, you can easily set the default copies to 0:

 

cas mysession sessopts=(defaulttablereplication=0) ;
%put defaulttablereplication: %sysfunc(getsessopt(mysession, defaulttablereplication)) ;

/* Load source tables */
proc casutil incaslib="dm" outcaslib="dm" ;
   list files ;
   load casdata="prdsale.sas7bdat" casout="prdsale" replace ;
   load casdata="bigprdsale.parquet" casout="bigprdsale" replace ;
   load incaslib="dm_pgdvd" outcaslib="dm_pgdvd" casdata="film" casout="film" replace ;
   list tables ;
quit ;

libname casdm cas caslib="dm" ;

/* Data manipulations */
data casdm.class_def ;
   set sashelp.class ;
run ;

/* many ETL steps */
/* ... */

/* many ETL steps */
/* ... */

/* Make the final table highly available (?) */
data casdm.final_table(copies=3 promote=yes) ;
   set casdm.class_def ;
run ;

cas mysession terminate ;

 

What about tables loaded in the UI (Manage Data)?

 

They are still loaded with the immutable default “copies” of 1 (allowing to survive 1 CAS node failure).

SAS is working on providing an equivalent option at the CAS server level. I'll update this article when it is available.

In the meantime, if you want to load tables with a different value for copies or replication, you will have to load them using code.

 

Thanks for reading.

 

 

Find more articles from SAS Global Enablement and Learning here.

Comments

this is a great option, we are looking forward to the update how to set is up for the whole cas server.  It will for sure save resources.

hello  @NicolasRobert 

I think this option for the whole cas server is available. Can you confirm it is the right one?

 

cas.DEFAULTTABLEREPLICATION=1

specifies the default replication factor for new tables for a session.

You can also edit the sas.cas.instance.config:config instance to set the variable.

Hello @touwen_k 

 

Yes it is the right one, but I believe it is not working well yet. The value is not taken into account.

I'll let you know when this is fixed.

 

Thanks.

Version history
Last update:
‎11-06-2023 07:31 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels