I am running a proc sql in teradata and this is taking more than 1 hour to run. The data size is relatively small with about 86,000 rows returned. I create a temp teradata table from a sas dataset with this code:
libname TDWORK teradata user="&user.@LDAP" password=&pword tdpid=bmg dbmstemp=yes
connection=global tpt=No ;
data tdwork.Cust_Mstr_Wk_Curr(dbcommit=20000 multistmt=yes
dbcreate_table_opts='PRIMARY INDEX (Cust_Num)') ;
set Cust_Mstr_Wk_Curr_Cdb ;
run;
I then need to join this temp table to other tables in teradata with this code.
libname bmg_pdd teradata server=<servername> tdpid=bmg database="Bmgpdd"
connection=global mode=teradata user="&user.@LDAP" password = &pword;
options symbolgen macrogen;
proc sql ;
create table tdwork.Bmgpdd_Acct_X_Cust_Hg as
(Select
A.* ,
M.Acct_Num As Acct_Num ,
M.Co_Id As Co_Id ,
B.Cust_Acct_Rel_Cd ,
m.Short_Name ,
m.PCode ,
m.Open_Dt ,
m.Reopen_Dt ,
m.Eom_Bal
From tdwork.Cust_Mstr_Wk_Curr A,
bmg_pdd.Acct_X_Cust_Hg_&yearmon._Cdb B,
bmg_pdd.Acct_Mstr_&yearmon._Hg M
Where A.Cust_Num = B.Cust_Num
and B.Acct_Num = M.Acct_Num
);
quit;
This code is within a macro. Given below is the log for this step.
MACROGEN(LAST12): proc sql ;
SYMBOLGEN: Macro variable YEARMON resolves to 202302
SYMBOLGEN: Macro variable YEARMON resolves to 202302
MACROGEN(LAST12): create table tdwork.Bmgpdd_Acct_X_Cust_Hg as (Select A.* , M.Acct_Num As Acct_Num , M.Co_Id As Co_Id ,
B.Cust_Acct_Rel_Cd , m.Short_Name , m.PCode , m.Open_Dt , m.Reopen_Dt , m.Eom_Bal From tdwork.Cust_Mstr_Wk_Curr A,
bmg_pdd.Acct_X_Cust_Hg_202302_Cdb B, bmg_pdd.Acct_Mstr_202302_Hg M Where A.Cust_Num = B.Cust_Num and B.Acct_Num = M.Acct_Num );
NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables.
NOTE: Table TDWORK.Bmgpdd_Acct_X_Cust_Hg created, with 85880 rows and 15 columns.
MACROGEN(LAST12): quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 1:08:08.17
cpu time 2:13.59
Is there a way to speed up the query? Thanks in advance.
Make sure to run the SQL JOIN in the teradata database and not force SAS to extract all of the data to the SAS environment and then write it back. THe best way is to use explicit passthru code.
So try something like:
proc sql ;
connect using TDWORK;
execute by TDWORK
( /* TERADATA SQL CODE GOES HERE */
);
quit;
You can ask for help from your TERADATA DBA for how to optimize the TERADATA SQL. TERADATA does have an EXPLAIN statement to show you how TERADATA plans to execute the query with estimates for how long each step will take. That is very good for seeing if the query might need to be changed to improve preformance.
TERADATA is a highly parallel environment. It uses the PRIMARY INDEX of the table to decide how to distribute the data to the different compute units. It is very important for performance that the distribution is not skewed so that the data is spread evenly across the nodes. If you do not tell it otherwise it will just use the first variable as the PRIMARY INDEX, which may not be optimal. If there is really no good variable to use as the primary index there is a way to tell it that the table does not use any index. If you want to create a new table in TERADATA by using implicit SQL then you can add the PRIMARY INDEX option in the dbcreate_table_opts= dataset name options.
Thank you so much, Tom! Will try this and get back on the thread.
Jit
Theoretically you should be able to do implicit SQL pass-through, but try to make your libname statements as identical as possible. I was some time I worked with TD, so I can't really tell if you need tpid= in one connection and database= in the other.
Add this prior to the join to see what's happening:
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;
Ok, thank you. Will try these options to trace the process. I am still struggling a bit with Tom's suggestion. Will include both your suggestions. Will be out for a few days, but will get back later this week. Thanks again!
Thanks to Tom and LinusH for your responses. I believe the key is to have a dedicated workspace on teradata where temp files could be stored. I have requested this space from the admin. Once I get it, I will try your suggestions again. As of now, the code inserts rows 1 at a time and does not use multi inserts. This is why it's taking so long. The options statement really helped in analyzing this. Thank you.
One issue, regardless of whether this runs in SAS or on Teradata is multiple Cartesian joins:
From tdwork.Cust_Mstr_Wk_Curr A, bmg_pdd.Acct_X_Cust_Hg_&yearmon._Cdb B, bmg_pdd.Acct_Mstr_&yearmon._Hg M Where A.Cust_Num = B.Cust_Num and B.Acct_Num = M.Acct_Num
Those comma are telling the SQL processor to match up every record in each table with every record in every other table. So if table A has 10 records, B as 10 records and M as 10 then you have used 10*10*10=1000 record matches and then selected from the result.
I don't have your data but a JOIN, Left, right or what not, ON A.cust_num=B.cust_num and b.acct_num=m.acct_num probably runs faster and should use less temporary space.
Thank you, BallardW, for the pointers. I will change the code to do joins instead. Still waiting for my work space, but will use your suggestions when I am ready to execute. Thanks again!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.