BookmarkSubscribeRSS Feed

Proc FedSQL and Multi-node load to CAS

Started ‎07-21-2022 by
Modified ‎07-21-2022 by
Views 1,232

SAS FedSQL is a proprietary implementation of ANSI SQL:1999 core standard. Proc FedSQL is the only way to run SQL at CAS (SAS Cloud Analytic Services). FedSQL with CAS supports conditional full-query implicit pass-through.

 

This blog post features, how to improve a query performance when SQL does not qualify for implicit pass-through.

 

When a SQL is accessing a single data source (CASLIB) whose tables are not loaded in CAS, an attempt is made to implicitly pass the full query down to the data source for processing, and only the result gets loaded into CAS. For implicit pass-through, all the tables involved in FedSQL must exist in the same CASLIB and not loaded in CAS.

 

It’s not always feasible for SQL to qualify for implicit pass-through. There could be situation where few source data tables are loaded, and some not loaded into CAS in a CASLIB. When a SQL submitted with set of mixed data source tables, the SQL does not qualify for implicit pass-through. The physical source tables are loaded temporarily into CAS on the fly and the request processed in CAS (load-on-demand capability). Most of the time CAS load temporary source tables using serial-load method which negatively impact the SQL performance. To improve the performance of temporary CAS load and query execution, you can use the Multi-node load mechanism in CASLIB.

 

Fed SQL With Multi-Node CASLIB

 

The following code describes the SQL execution at CAS using FedSQL against a set of source tables. The two CASLIBs are defined with/without Multi-Node parameters to evaluate the loading mechanism and query performance. The FedSQL statement includes “cntl=(disablePassThrough)” parameter to mimic the scenario to load the source table into CAS before executing SQL. The same SQL is running against both CASLIBs. The FedSQL log indicates that CASLIB with Multi-node parameter is taking less time to load the source table into CAS as using all available CAS nodes.

 

Code:

 

cas mySession sessopts=(metrics=true) ;
 
options msglevel=i ;
 
/* CASLIB with standard parameter */
caslib pg datasource=(
srctype="postgres",
user="sas",
password="XXXXXXXXXX",
server="gel-postgresql.postgres.svc.cluster.local",
database="student",
schema="public",
) libref=pg ;
 
 
/* CASLIB with Multi-node parameter */
caslib pgm datasource=(
srctype="postgres",
user="sas",
password="XXXXXXXX",
server="gel-postgresql.postgres.svc.cluster.local",
database="student",
schema="public",
numreadnodes=10,
DRIVER_TRACE="SQL",
   DRIVER_TRACEFILE="/gelcontent/data/LOG/_sasdcpg_$SAS_CURRENT_HOST.log",
   DRIVER_TRACEOPTIONS="TIMESTAMP|APPEND"
) libref=pgm ;
 
/* List  source tables */
proc casutil ;
   list files incaslib="pg" ;
   list files incaslib="pgm" ;
quit ;
 
/* FEDSQL with standard CASLIB */  
proc fedsql sessref=mySession _method cntl=(disablePassThrough) ;
   create table pg.cust_order{options replace=true replication=0} as
   select t1.transaction_id,
          t1.transaction_line_item_no,
          t1.retail_transaction_type_cd,
          t1.order_channel,
		  t1.order_type,
		  t2.customer_rk,
		  t2.customer_type,
		  t2.gender_cd,
		  t2.employee_rk ,
		  t2.loyalty_program
   from pg."order_fact10" as t1, pg."customer_dim" t2  where t1.customer_rk=t2.customer_rk ;
quit ;
 
 
/* FEDSQL with CASLIB having Mulit-node parameter */
proc fedsql sessref=mySession _method cntl=(disablePassThrough) ;
   create table pgm.cust_order{options replace=true replication=0} as
   select t1.transaction_id,
          t1.transaction_line_item_no,
          t1.retail_transaction_type_cd,
          t1.order_channel,
		  t1.order_type,
		  t2.customer_rk,
		  t2.customer_type,
		  t2.gender_cd,
		  t2.employee_rk ,
		  t2.loyalty_program
   from pgm."order_fact10" as t1, pgm."customer_dim" t2  where t1.customer_rk=t2.customer_rk ;
quit ;
 
/* List  target tables */
proc casutil ;
   list tables incaslib="pg" ;
   list tables incaslib="pgm" ;
quit ;
 
cas mySession terminate ;

 

Log extract:

 


…………
……..
112  
113  /* FEDSQL with standard CASLIB */
114  proc fedsql sessref=mySession _method cntl=(disablePassThrough) ;
NOTE: FEDSQL: Running on CAS due to "sessref".
115     create table pg.cust_order{options replace=true replication=0} as
116     select t1.transaction_id,
117            t1.transaction_line_item_no,
118            t1.retail_transaction_type_cd,
119            t1.order_channel,
120    t1.order_type,
121    t2.customer_rk,
122    t2.customer_type,
123    t2.gender_cd,
124    t2.employee_rk ,
125    t2.loyalty_program
126     from pg."order_fact10" as t1, pg."customer_dim" t2  where t1.customer_rk=t2.customer_rk ;
NOTE: Executing action 'builtins.loadActionSet'.
NOTE: Added action set 'fedsql'.
NOTE: Action 'builtins.loadActionSet' used (Total process time):
NOTE:       real time               0.027115 seconds
NOTE:       cpu time                0.029232 seconds (107.81%)
NOTE:       total nodes             4 (32 cores)
NOTE:       total memory            251.04G
NOTE:       memory                  1015.75K (0.00%)
action returned string actionset = fedsql
NOTE: CAS action completed [OKAY]
FEDSQL: load of action set returned rc=00000000
NOTE: Executing action 'fedSql.execDirect'.
 
Methods for full query plan
----------------------------
Number of Joins Performed is : 1
        MergeJoin (INNER) 
          Sort 
            SeqScan from PG.order_fact10 
          Sort 
            SeqScan from PG.customer_dim 
 
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to PostgreSQL.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE:       real time               10.908597 seconds
NOTE:       cpu time                10.814081 seconds (99.13%)
NOTE:       data movement time      0.481373 seconds
NOTE:       total nodes             4 (32 cores)
NOTE:       total memory            251.04G
NOTE:       memory                  108.79M (0.04%)
NOTE:       bytes moved             129.71M
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to PostgreSQL.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE:       real time               1.168007 seconds
NOTE:       cpu time                0.860825 seconds (73.70%)
NOTE:       data movement time      0.091907 seconds
NOTE:       total nodes             4 (32 cores)
NOTE:       total memory            251.04G
NOTE:       memory                  32.09M (0.01%)
NOTE:       bytes moved             10.35M
Methods for stage 1
--------------------
        MergeJoin (INNER) 
            SeqScan with _pushed_ order by from PG.order_fact10 
            SeqScan with _pushed_ order by from PG.customer_dim 
 
NOTE: Table CUST_ORDER was created in caslib PG with 1000000 rows returned.
NOTE: Action 'fedSql.execDirect' used (Total process time):
NOTE:       real time               14.616581 seconds
NOTE:       cpu time                18.066371 seconds (123.60%)
NOTE:       data movement time      0.718373 seconds
NOTE:       total nodes             4 (32 cores)
NOTE:       total memory            251.04G
NOTE:       memory                  587.43M (0.23%)
NOTE:       bytes moved             140.08M
NOTE: CAS action completed [OKAY]
FEDSQL: The fedsql.execDirect action returned rc=0x00000000
127  quit ;
NOTE: PROCEDURE FEDSQL used (Total process time):
      real time           14.68 seconds
      cpu time            0.12 seconds
 
128  
129  
130  /* FEDSQL with CASLIB having Mulit-node parameter */
131  proc fedsql sessref=mySession _method cntl=(disablePassThrough) ;
NOTE: FEDSQL: Running on CAS due to "sessref".
132     create table pgm.cust_order{options replace=true replication=0} as
133     select t1.transaction_id,
134            t1.transaction_line_item_no,
135            t1.retail_transaction_type_cd,
136            t1.order_channel,
137    t1.order_type,
138    t2.customer_rk,
139    t2.customer_type,
140    t2.gender_cd,
141    t2.employee_rk ,
142    t2.loyalty_program
143     from pgm."order_fact10" as t1, pgm."customer_dim" t2  where t1.customer_rk=t2.customer_rk ;
NOTE: Executing action 'builtins.loadActionSet'.
NOTE: Added action set 'fedsql'.
NOTE: Action 'builtins.loadActionSet' used (Total process time):
NOTE:       real time               0.029489 seconds
NOTE:       cpu time                0.016578 seconds (56.22%)
NOTE:       total nodes             4 (32 cores)
NOTE:       total memory            251.04G
NOTE:       memory                  906.84K (0.00%)
action returned string actionset = fedsql
NOTE: CAS action completed [OKAY]
FEDSQL: load of action set returned rc=00000000
NOTE: Executing action 'fedSql.execDirect'.
 
Methods for full query plan
----------------------------
Number of Joins Performed is : 1
        MergeJoin (INNER) 
          Sort 
            SeqScan from PGM.order_fact10 
          Sort 
            SeqScan from PGM.customer_dim 
 
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to PostgreSQL.
WARNING: The value of numReadNodes(10) exceeds the number of available worker nodes(3). The load will proceed with numReadNodes=3.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE:       real time               7.288136 seconds
NOTE:       cpu time                6.212600 seconds (85.24%)
NOTE:       data movement time      0.627832 seconds
NOTE:       total nodes             4 (32 cores)
NOTE:       total memory            251.04G
NOTE:       memory                  66.12M (0.03%)
NOTE:       bytes moved             129.71M
NOTE: Executing action 'table.loadTable'.
NOTE: Performing serial LoadTable action using SAS Data Connector to PostgreSQL.
WARNING: The value of numReadNodes(10) exceeds the number of available worker nodes(3). The load will proceed with numReadNodes=3.
NOTE: Action 'table.loadTable' used (Total process time):
NOTE:       real time               0.743209 seconds
NOTE:       cpu time                0.615163 seconds (82.77%)
NOTE:       data movement time      0.090429 seconds
NOTE:       total nodes             4 (32 cores)
NOTE:       total memory            251.04G
NOTE:       memory                  34.04M (0.01%)
NOTE:       bytes moved             10.35M
Methods for stage 1
--------------------
        MergeJoin (INNER) 
            SeqScan with _pushed_ order by from PGM.order_fact10 
            SeqScan with _pushed_ order by from PGM.customer_dim 
 
NOTE: Table CUST_ORDER was created in caslib PGM with 1000000 rows returned.
NOTE: Action 'fedSql.execDirect' used (Total process time):
NOTE:       real time               11.106149 seconds
NOTE:       cpu time                13.215052 seconds (118.99%)
NOTE:       data movement time      1.427191 seconds
NOTE:       total nodes             4 (32 cores)
NOTE:       total memory            251.04G
NOTE:       memory                  590.26M (0.23%)
NOTE:       bytes moved             140.08M
NOTE: CAS action completed [OKAY]
FEDSQL: The fedsql.execDirect action returned rc=0x00000000
144  quit ;
NOTE: PROCEDURE FEDSQL used (Total process time):
      real time           11.18 seconds
      cpu time            0.06 seconds     
145  
………….
…………………….

 

Important Links

 

FedSQL Programming for CAS

 

SAS language capabilities to efficiently process data residing in CAS

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎07-21-2022 04:12 PM
Updated by:
Contributors

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags