The SAS option I use the most when accessing a third-party database from SAS is the very intuitive SASTRACE option.
Who has not used:
options sastrace=",,,d" sastraceloc=saslog nostsuffix ;
to monitor which SQL query SAS is sending to the database?
Yes, you are a “comma comma comma d” fanatic!
But now in SAS Viya, you all know that a database is accessed differently. Not only from SAS with the traditional SAS/ACCESS engine but also from SAS Cloud Analytic Services (CAS) with the Data Connectors or Data Connect Accelerators. Does this “sastrace” option work for CAS as well to identify what CAS is sending to the database? Nope. We need something else.
First thing to note is that there is no real documentation (that I’m aware of) about the SAS Data Connector/SAS Data Connect Accelerator logging options.
The main source of information is this wonderful paper from SAS Global Forum 2017: Introduction to SAS® Data Connectors and SAS® Data Connect Accelerators on SAS® Viya™.
I encourage you to read this document for more details on the Data Connectors and Data Connect Accelerators in general and their tracing options in particular.
In this article, I’ll review 2 examples of how to set the Data Connector/Data Connect Accelerator logging options.
There are 3 options to trace a SAS Data Connector communication:
You can set those options in the CASLIB definition or specifically during a load or a specific operation using the dataSourceOptions option.
Here is an example:
caslib PG datasource=(srctype="postgres",authenticationdomain="PGAuth",server="mydb.example.com",
database="dvdrental",schema="public",numreadnodes=10,numwritenodes=10,
DRIVER_TRACE="SQL",
DRIVER_TRACEFILE="/tmp/sasdcpg.log",
DRIVER_TRACEOPTIONS="TIMESTAMP|APPEND") libref=PG ;
In this example:
The CASLIB assignment doesn’t trigger the creation of the trace file yet, since no operation on the database has been done.
Now, let’s run a multi-node data load statement from this CASLIB data source (this is a MPP CAS and we have set numreadnodes appropriately):
proc casutil incaslib="PG" outcaslib="PG" ;
load casdata="film" casout="film" replace ;
quit ;
As a result, on a 5-nodes CAS environment (1 CAS controller + 4 CAS workers), we get a sasdcpg.log file created in /tmp on every CAS node. Here are the contents:
CAS Controller:
10.01.04.36: DEFAULT: SELECT * FROM "public"."film" WHERE 0=1
10.01.04.36: DRIVER SQL: "SELECT * FROM "public"."film" WHERE 0=1" on connection 0x000000009c0d1480
CAS Worker 1:
10.01.04.37: DEFAULT: SELECT * FROM (SELECT * FROM "public"."film") sliceSQL WHERE ABS(MOD("film_id", 4)) = 0 OR "film_id" IS NULL
10.01.04.37: DRIVER SQL: "SELECT * FROM (SELECT * FROM "public"."film") sliceSQL WHERE ABS(MOD("film_id", 4)) = 0 OR "film_id" IS NULL" on connection 0x0000000024051520
CAS Worker 2:
10.01.04.37: DEFAULT: SELECT * FROM (SELECT * FROM "public"."film") sliceSQL WHERE ABS(MOD("film_id", 4)) = 1
10.01.04.37: DRIVER SQL: "SELECT * FROM (SELECT * FROM "public"."film") sliceSQL WHERE ABS(MOD("film_id", 4)) = 1" on connection 0x000000008cff6520
CAS Worker 3:
10.01.04.37: DEFAULT: SELECT * FROM (SELECT * FROM "public"."film") sliceSQL WHERE ABS(MOD("film_id", 4)) = 2
10.01.04.37: DRIVER SQL: "SELECT * FROM (SELECT * FROM "public"."film") sliceSQL WHERE ABS(MOD("film_id", 4)) = 2" on connection 0x00000000900a4520
CAS Worker 4:
10.01.04.37: DEFAULT: SELECT * FROM (SELECT * FROM "public"."film") sliceSQL WHERE ABS(MOD("film_id", 4)) = 3
10.01.04.37: DRIVER SQL: "SELECT * FROM (SELECT * FROM "public"."film") sliceSQL WHERE ABS(MOD("film_id", 4)) = 3" on connection 0x00000000a4274520
You can now observe and understand how a multi-node load works:
NB: These 3 options do not work for SAS Data Connector for Hadoop (dataTransferMode="serial") for some reason. Instead, use the SAS Data Connect Accelerator options (TRACEFLAGS and TRACEFILE) for Hadoop. See below.
There are 3 options to trace a SAS Data Connect Accelerator communication:
You can set those options in the CASLIB definition or specifically during a load or a specific operation using the dataSourceOptions option.
Here is an example:
caslib HIVE datasource=(srctype="hadoop",server="myhdp.example.com",authenticationdomain="HiveCon",
dataTransferMode="serial",hadoopconfigdir="/opt/sas/viya/config/data/hadoop/conf",
hadoopjarpath="/opt/sas/viya/config/data/hadoop/lib",schema="cashive",numreadnodes=10,numwritenodes=10,
TRACEFLAGS="TIMESTAMP|APPEND",
TRACEFILE="/tmp/sasdcaHadoop.log",
DFDEBUG="EPALL") libref=HIVE ;
In this example:
Here is a sample output of the DFDEBUG=”EPALL” option:
73 proc casutil incaslib="HIVE" outcaslib="HIVE" ;
NOTE: The UUID 'c4cdb27f-4e4b-4340-94d7-43e9dfd1628b' is connected using session MYSESSION.
74 load casdata="test_table" casout="test_table" replace ;
NOTE: Performing parallel LoadTable action using SAS Data Connect Accelerator for Hadoop.
NOTE: SAS Embedded Process tracking URL: http://myhdp.example.com:8088/proxy/application_1534533938504_0016/
NOTE: Job Status ......: SUCCEEDED
NOTE: Job ID ..........: 1534533938504_16
NOTE: Job Name ........: SASEP SuperReader /apps/hive/warehouse/cashive.db/test_table
NOTE: File splits..... : 4
NOTE: Input records ...: 16000
NOTE: Input bytes .....: 590288
NOTE: Output records ..: 0
NOTE: Output bytes ....: 0
NOTE: Transcode errors : 0
NOTE: Truncations .....: 0
NOTE: Map Progress ....: 100.00%
NOTE: Cloud Analytic Services made the external data from test_table available as table TEST_TABLE in caslib HIVE.
NOTE: The Cloud Analytic Services server processed the request in 23.648689 seconds.
75 quit ;
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 23.65 seconds
cpu time 0.00 seconds
When used in a parallel context (dataTransferMode="parallel"), the trace file will be located on the CAS controller only.
When used in a serial context (dataTransferMode="serial"), the trace file can be located on the CAS controller only (no multi-node load) or on all CAS nodes (multi-node load). You can check the multi-node SQL queries by using this command:
$ more sasdcaHadoop.log | grep -i sql
In: HandleType=TKTS_HANDLE_DBC=2, Handle=0x7fa1524ddb60, RecNumber=1, Sqlstate=0x7fa159a77fe0, NativeErrorPtr=0x7fa159a77fd4, MessageText=0x7fa159a777d0, BufferLength=513, TextLengthPtr=0x7fa159a777d0
Out: *Sqlstate=`01S02`,*NativeErrorPtr=-2130708291,MessageText=`Current catalog set to HIVE`,TextLengthPtr=27
In: HandleType=TKTS_HANDLE_DBC=2, Handle=0x7fa1524ddb60, RecNumber=2, Sqlstate=0x7fa159a77fe0, NativeErrorPtr=0x7fa159a77fd4, MessageText=0x7fa159a777d0, BufferLength=513, TextLengthPtr=0x7fa159a777d0
In: StatementHandle=0x7fa1524d6fe0, Sql=`SELECT * FROM (SELECT * FROM `cashive`.`test_table`) sliceSQL WHERE ABS(PMOD(`index_pk`, 4)) = 0 OR `index_pk` IS NULL`, SqlL=118
$
Hope this helps. Thanks for reading.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.