Some time ago, I wrote about SAS Data Connector Logging. Indeed, getting information about what SQL instructions CAS is sending to the database is critical to understand data movement and optimize performance. However, the way Data Connector logs are created are not very convenient. Tracing logs are created on the CAS nodes which a standard SAS programmer would not be able to access (or not easily). It is even more difficult for SAS developers to access them in a Kubernetes environment.
Would Kubernetes admins let SAS users access pods and containers and try getting access to some local files? Probably not. Would SAS users want to run complex kubectl commands to try to understand their data flows? Probably not. So, is there a way to simplify access to the Data Connector logs?
What are the challenges?
Here is an example of setting the tracing options for SAS Data Connectors:
caslib PG datasource=(srctype="postgres",authenticationdomain="PGAuth",
server="sasdb.example.com",database="dvdrental",schema="public",
numreadnodes=10,numwritenodes=10,
DRIVER_TRACE="SQL",
DRIVER_TRACEFILE="/tmp/sasdcpg.log",
DRIVER_TRACEOPTIONS="TIMESTAMP|APPEND") libref=PG ;
Data Connector Tracing Logs are created from the CAS server nodes on a path accessible to them. So, depending on the path you indicate, logs are created locally or on a shared file system. However, by default, logs get created with a static name (the same name for all the CAS nodes if you are in a MPP environment). Consequently, a shared file system doesn’t look a good option since each CAS controller/worker will overwrite each other’s log.
What do we want then?
With SAS Viya on Kubernetes, sharing file systems is easily done through Persistent Volumes. Your Kubernetes admin will have to define a shared Persistent Volume for the CAS containers as well as for the SAS Compute containers so that all the Data Connector logs get created at the same location and are accessible from a SAS Compute session.
Then we need to manage the name of the logs. Fortunately, CAS containers are started with a set of environment variables. Cool, there is one named SAS_CURRENT_HOST which contains the name of the CAS server a Controller or a Worker is running on. And cherry on the cake, this variable is correctly resolved when used in the DRIVER_TRACEFILE option.
So, we have all the pieces. Let’s test this on a 4 workers CAS environment.
Assign the Data Connector CASLIB
First, you’ll need to assign the Database CASLIB with the right tracing options:
caslib caspg datasource=(srctype="postgres",username="myuser",password="XXXXXX",
server="myserver",database="dvdrental",schema="public",
numreadnodes=10,numwritenodes=10,
DRIVER_TRACE="SQL",
DRIVER_TRACEFILE="/gelcontent/demo/DM/data/LOG/&sysuserid._sasdcpg_$SAS_CURRENT_HOST.log",
DRIVER_TRACEOPTIONS="TIMESTAMP") libref=caspg ;
Here:
Load Data in CAS using the Data Connector
Then you can start to load your database data into CAS:
proc casutil incaslib="caspg" outcaslib="caspg" ;
load casdata="film" casout="PG_film" ;
quit ;
As a result, we will get the following files in the log directory (here /gelcontent/demo/DM/data/LOG/) :
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
gatedemo035 is my user id here.
Parse the Multiple Trace Files using SAS
To get information about the different SQL queries that the CAS controller/workers send to the database, you’ll need to parse those log files:
/* Display trace */
data _null_ ;
length logname oldlogname $ 256 casnode $ 20 ;
retain oldlogname "dummy" ;
infile "/gelcontent/demo/DM/data/LOG/&sysuserid._sasdcpg_*.log" filename=logname ;
input ;
put ;
if index(_infile_,"DRIVER SQL")>0 then do ;
if logname ne oldlogname then do ;
casnode=scan(scan(scan(logname,-1,"/"),1,"."),-1,"_") ;
put "*****" ;
put "LOG FROM CAS NODE: " casnode ;
put "*****" ;
oldlogname=logname ;
end ;
put _infile_ ;
end ;
run ;
Here, we parse them all at once using a wildcard. Then we print some information about the SQL sent to the database. We also do some string manipulation to indicate in the SAS Log which CAS node the Data Connector trace comes from. Expected result:
78 data _null_ ;
79 length logname oldlogname $ 256 casnode $ 20 ;
80 retain oldlogname "dummy" ;
81 infile "/gelcontent/demo/DM/data/LOG/&sysuserid._sasdcpg_*.log" filename=logname ;
82 input ;
83 put ;
84 if index(_infile_,"DRIVER SQL")>0 then do ;
85 if logname ne oldlogname then do ;
86 casnode=scan(scan(scan(logname,-1,"/"),1,"."),-1,"_") ;
87 put "*****" ;
88 put "LOG FROM CAS NODE: " casnode ;
89 put "*****" ;
90 oldlogname=logname ;
91 end ;
92 put _infile_ ;
93 end ;
94 run ;
NOTE: The infile "/gelcontent/demo/DM/data/LOG/gatedemo035_sasdcpg_*.log" is:
Filename=/gelcontent/demo/DM/data/LOG/gatedemo035_sasdcpg_controller.sas-cas-server-default.gelenv-stable.log,
File List=/gelcontent/demo/DM/data/LOG/gatedemo035_sasdcpg_*.log,
Owner Name=sas,Group Name=sas,
Access Permission=-rwxr-xr-x,
Last Modified=11May2021:14:27:33,
File Size (bytes)=3567
*****
LOG FROM CAS NODE: controller
*****
18.27.33.81: DRIVER SQL: "SELECT * FROM "public"."film" WHERE 0 = 1" on connection 0x0000000064180f00
18.27.33.85: DRIVER SQL: "select "public"."film"."film_id", "public"."film"."title", "public"."film"."description", "public
"."film"."release_year", "public"."film"."language_id", "public"."film"."rental_duration", "public"."film"."rental_rate", "public"."
film"."length", "public"."film"."replacement_cost", "public"."film"."rating", "public"."film"."last_update", "public"."film"."specia
l_features", "public"."film"."fulltext" from "public"."film" where ( (0=1) ) " on connection 0x000000006419a600
NOTE: The infile "/gelcontent/demo/DM/data/LOG/gatedemo035_sasdcpg_*.log" is:
Filename=/gelcontent/demo/DM/data/LOG/gatedemo035_sasdcpg_worker-0.sas-cas-server-default.gelenv-stable.log,
File List=/gelcontent/demo/DM/data/LOG/gatedemo035_sasdcpg_*.log,
Owner Name=sas,Group Name=sas,
Access Permission=-rwxr-xr-x,
Last Modified=11May2021:14:27:33,
File Size (bytes)=4455
*****
LOG FROM CAS NODE: worker-0
*****
18.27.33.87: DRIVER SQL: "SELECT * FROM "public"."film" WHERE 0 = 1" on connection 0x000000005c05ad40
18.27.33.94: DRIVER SQL: "select "SLICE_SQL"."film_id", "SLICE_SQL"."title", "SLICE_SQL"."description", "SLICE_SQL"."releas
e_year", "SLICE_SQL"."language_id", "SLICE_SQL"."rental_duration", "SLICE_SQL"."rental_rate", "SLICE_SQL"."length", "SLICE_SQL"."rep
lacement_cost", "SLICE_SQL"."rating", "SLICE_SQL"."last_update", "SLICE_SQL"."special_features", "SLICE_SQL"."fulltext" from (selec
t "public"."film"."film_id", "public"."film"."title", "public"."film"."description", "public"."film"."release_year", "public"."film"
."language_id", "public"."film"."rental_duration", "public"."film"."rental_rate", "public"."film"."length", "public"."film"."replace
ment_cost", "public"."film"."rating", "public"."film"."last_update", "public"."film"."special_features", "public"."film"."fulltext"
from "public"."film") "SLICE_SQL" where ( ( (MOD (ABS ("SLICE_SQL"."film_id") , 4) =0) or "SLICE_SQL"."film_id" IS NULL) ) " on c
onnection 0x000000004f5460e0
<...>
Here we go!
We now have a lot of information in the SAS Log about how CAS worked with the database. This is really useful in multi-node contexts to see how the slicing column has been chosen or to validate that your slicing expressions are correctly transmitted to the database.
Clean Up the Trace Files using CAS
By default, Data Connectors log files are replaced for each data access operation, which is fine for what we are doing (check the SQL of each operation). If you are using APPEND (DRIVER_TRACEOPTIONS="TIMESTAMP|APPEND"), then at some point, you might want to purge the trace files.
There are multiple ways to get rid of the trace files created, but using CAS, CASL and a CASLIB is an elegant one:
/* Delete the TRACE files */
caslib log type=path path="/gelcontent/demo/DM/data/LOG/" ;
proc cas ;
table.fileinfo result=fileresult /
caslib="log" allFiles=true path="&sysuserid._sasdcpg_%" ;
filelist=findtable(fileresult) ;
do cvalue over filelist ;
table.deleteSource /
caslib="log"
source=cvalue.name
quiet=true ;
end ;
quit ;
caslib log drop ;
You can find the original source code as macro-programs and a full example attached to this article.
This same code can be used on older (and non Kubernetes) versions of SAS Viya. Simply share a file system through NFS between all the CAS nodes and the SAS Compute Server. Use $SAS_HOSTNAME instead of $SAS_CURRENT_HOST as it seems $SAS_CURRENT_HOST is not defined on every CAS node in SAS Viya 3.5. You may also have to adapt the parsing of the CAS hosts’ names, since they are probably totally different than what we observe on Kubernetes.
Thanks for reading.
Find more articles from SAS Global Enablement and Learning here.
I don't follow all of this but I wonder what happens if the same user is running multiple jobs? Since the log file names seems to be based only on the userid and hostname won't the different jobs try write log messages into the same file?
Hi @Tom
You're right. You can certainly modify the trace filename to include a timestamp or something else for example.
Also, tracing the data connector is a debugging process. It's not something you want to keep forever.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.