BookmarkSubscribeRSS Feed

Database Multi-Node Load – New SLICEEXPRESSIONS Option in Viya

Started ‎01-08-2021 by
Modified ‎01-08-2021 by
Views 4,905

Multi-node load is the process of involving several CAS workers in an MPP environment to load data coming from a database. Each CAS worker sends a different SQL query to the database and receives the corresponding portion of the data. This allows the parallelization of database table loads. If the database can handle multiple concurrent queries appropriately and the bandwidth between the CAS nodes and the database is good enough, this is a very efficient way of loading external data in CAS.

 

nir_post_60_01_multi_node.png

Select the image to see a larger version.
Mobile users: To view the image, select the "Full" version at the bottom of the page.

 

You enable multi-node load by using the NUMREADNODES option (in a CASLIB statement or CASUTIL procedure). This defines how many workers you want to involve in the parallel load. Then, by default, if not user-specified, CAS locates the first suitable numeric column in the source table (Integer, Decimal, Numeric, Double) to be used as the slicing column. CAS will use that slicing column to prepare and send distinct SQL queries based on the database MOD(ULO) function (remainder of the division) and the NUMREADNODES value.

 

Example of multi-node load code:

 

proc casutil incaslib="caspg" outcaslib="caspg" ;
   load casdata="film" casout="PG_film" options=(numreadnodes=4) replace ;
quit ;

 

Example of generated SQL (last CAS worker):

 

select "SLICE_SQL"."film_id", ...,
       "SLICE_SQL"."fulltext" 
from  (select "public"."film"."film_id", ...,
              "public"."film"."fulltext" 
       from "public"."film")  "SLICE_SQL"
where  ( (MOD (ABS ("SLICE_SQL"."film_id") , 4) =1) )

 

Of course, one can choose the slicing column instead of letting CAS do it. A user probably knows better the data than CAS does. So, if he knows a good slicing column candidate (right cardinality, correctly balanced, indexed, etc.) he can specify it using the SLICECOLUMN option.

 

Example of multi-node code with SLICECOLUMN:

 

proc casutil incaslib="caspg" outcaslib="caspg" ;
   load casdata="film" casout="PG_film" options=(numreadnodes=4 sliceColumn="length") replace ;
quit ;

 

Example of generated SQL (last CAS worker):

 

select "SLICE_SQL"."film_id", ...,
       "SLICE_SQL"."fulltext" 
from  (select "public"."film"."film_id", ...,
              "public"."film"."fulltext" 
       from "public"."film")  "SLICE_SQL"
where  ( (MOD (ABS ("SLICE_SQL"."length") , 4) =1) )

 

With the new SAS Viya, a new option has appeared and is named SLICEEXPRESSIONS. It allows the user to be in total control of the SQL queries that will be sent to the database by explicitly specifying the filtering conditions that will partition your database data for multi-node loading into CAS. This offers many advantages like controlling the distribution of the data over the CAS workers or specifying efficient filtering conditions based on database or table settings.

 

Example of multi-node code with SLICEEXPRESSIONS:

 

proc casutil incaslib="caspg" outcaslib="caspg" ;
   load casdata="film" casout="PG_film" options=(numreadnodes=2 sliceExpressions=("length < 100","length >= 100")) replace ;
quit ;

 

Example of generated SQL (second CAS worker):

 

select "SLICE_SQL"."film_id", ...,
       "SLICE_SQL"."fulltext" 
from  (select "public"."film"."film_id", ...,
              "public"."film"."fulltext" 
       from "public"."film")  "SLICE_SQL"
where  ( ("SLICE_SQL"."length">=100) )

 

The number of expressions in the SLICEEXPRESSIONS option must match the NUMREADNODES value. SQL expressions are passed "as is" to the database. Keep also in mind that when the NUMREADNODES value is less than the number of available CAS workers, the database table will land only on that specified number of nodes.

 

View the documentation of the SLICEEXPRESSIONS option (PostgreSQL example, available for most Data Connectors).

 

Thanks for reading.

Version history
Last update:
‎01-08-2021 10:31 AM
Updated by:
Contributors

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags