BookmarkSubscribeRSS Feed

Database Multi-Node Load – New SLICEEXPRESSIONS Option in Viya

Started ‎01-08-2021 by
Modified ‎01-08-2021 by
Views 4,740

Multi-node load is the process of involving several CAS workers in an MPP environment to load data coming from a database. Each CAS worker sends a different SQL query to the database and receives the corresponding portion of the data. This allows the parallelization of database table loads. If the database can handle multiple concurrent queries appropriately and the bandwidth between the CAS nodes and the database is good enough, this is a very efficient way of loading external data in CAS.

 

nir_post_60_01_multi_node.png

Select the image to see a larger version.
Mobile users: To view the image, select the "Full" version at the bottom of the page.

 

You enable multi-node load by using the NUMREADNODES option (in a CASLIB statement or CASUTIL procedure). This defines how many workers you want to involve in the parallel load. Then, by default, if not user-specified, CAS locates the first suitable numeric column in the source table (Integer, Decimal, Numeric, Double) to be used as the slicing column. CAS will use that slicing column to prepare and send distinct SQL queries based on the database MOD(ULO) function (remainder of the division) and the NUMREADNODES value.

 

Example of multi-node load code:

 

proc casutil incaslib="caspg" outcaslib="caspg" ;
   load casdata="film" casout="PG_film" options=(numreadnodes=4) replace ;
quit ;

 

Example of generated SQL (last CAS worker):

 

select "SLICE_SQL"."film_id", ...,
       "SLICE_SQL"."fulltext" 
from  (select "public"."film"."film_id", ...,
              "public"."film"."fulltext" 
       from "public"."film")  "SLICE_SQL"
where  ( (MOD (ABS ("SLICE_SQL"."film_id") , 4) =1) )

 

Of course, one can choose the slicing column instead of letting CAS do it. A user probably knows better the data than CAS does. So, if he knows a good slicing column candidate (right cardinality, correctly balanced, indexed, etc.) he can specify it using the SLICECOLUMN option.

 

Example of multi-node code with SLICECOLUMN:

 

proc casutil incaslib="caspg" outcaslib="caspg" ;
   load casdata="film" casout="PG_film" options=(numreadnodes=4 sliceColumn="length") replace ;
quit ;

 

Example of generated SQL (last CAS worker):

 

select "SLICE_SQL"."film_id", ...,
       "SLICE_SQL"."fulltext" 
from  (select "public"."film"."film_id", ...,
              "public"."film"."fulltext" 
       from "public"."film")  "SLICE_SQL"
where  ( (MOD (ABS ("SLICE_SQL"."length") , 4) =1) )

 

With the new SAS Viya, a new option has appeared and is named SLICEEXPRESSIONS. It allows the user to be in total control of the SQL queries that will be sent to the database by explicitly specifying the filtering conditions that will partition your database data for multi-node loading into CAS. This offers many advantages like controlling the distribution of the data over the CAS workers or specifying efficient filtering conditions based on database or table settings.

 

Example of multi-node code with SLICEEXPRESSIONS:

 

proc casutil incaslib="caspg" outcaslib="caspg" ;
   load casdata="film" casout="PG_film" options=(numreadnodes=2 sliceExpressions=("length < 100","length >= 100")) replace ;
quit ;

 

Example of generated SQL (second CAS worker):

 

select "SLICE_SQL"."film_id", ...,
       "SLICE_SQL"."fulltext" 
from  (select "public"."film"."film_id", ...,
              "public"."film"."fulltext" 
       from "public"."film")  "SLICE_SQL"
where  ( ("SLICE_SQL"."length">=100) )

 

The number of expressions in the SLICEEXPRESSIONS option must match the NUMREADNODES value. SQL expressions are passed "as is" to the database. Keep also in mind that when the NUMREADNODES value is less than the number of available CAS workers, the database table will land only on that specified number of nodes.

 

View the documentation of the SLICEEXPRESSIONS option (PostgreSQL example, available for most Data Connectors).

 

Thanks for reading.

Version history
Last update:
‎01-08-2021 10:31 AM
Updated by:
Contributors

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags