Hello
I run 2 versions of query and the difference is the location of WHERE clauses.
In way1 the where clauses are at the end of the query.
In way2 the where clauses are in the place of define the data set
My question-
I was surprised that way1 is working very quickly and way2 very slowly .
I even stopped the run of way2 before it finish because it took so long time.
Why Way2 is running so slowly?
By theory, is Way1 better way to write the query than Way2?
Apologize that I cannot add data but it is more theoretical question.
Maybe the reason is working with tables located in teradata???
Is efficiency of Way1 and Way2 is depending on the location of the source tables ?(SAS or other like TeraData)
/***Way1------very quick query******/
/***Way1------very quick query******/
/***Way1------very quick query******/
proc sql;
create table Way1 as
SELECT *
FROM TeraData.t1
INNER JOIN TeraData.t2
ON t2.Team_IP=t1.Team_IP
INNER JOIN TeraData.t3
ON t3.Branch_Cust_IP=t1.Branch_Cust_IP
LEFT JOIN TeraData.t4
ON t3.Agreement_Account_Id= t4.Agreement_Account_Id
LEFT JOIN TeraData.t5
ON t5.Agreement_Account_Id=t3.Agreement_Account_Id
LEFT JOIN TeraData.t6
ON t6.Agreement_Account_Id=t3.Agreement_Account_Id
INNER JOIN TeraData.t7
ON t3.Agreement_Account_Id=t7.Agreement_Account_Id
Left join (Select * from TeraData.t8 where Agreement_Summary_Date='31DEC2022'd)
ON t3.Agreement_Account_Id=t8.Agreement_Account_Id
where t7.Agreement_Summary_Date='31DEC2022'd
AND t3.Agreement_Category_Code IN (3)
AND t3.Loan_Type_Code IN (10,20)
;
quit;
/***Way2------very slow query******/
/***Way2------very slow query******/
/***Way2------very slow query******/
proc sql;
create table Way1 as
SELECT *
FROM TeraData.t1
INNER JOIN TeraData.t2
ON t2.Team_IP=t1.Team_IP
INNER JOIN TeraData.t3 (Where=(Agreement_Category_Code IN (3) AND Loan_Type_Code IN (10,20)))
ON t3.Branch_Cust_IP=t1.Branch_Cust_IP
LEFT JOIN TeraData.t4
ON t3.Agreement_Account_Id= t4.Agreement_Account_Id
LEFT JOIN TeraData.t5
ON t5.Agreement_Account_Id=t3.Agreement_Account_Id
LEFT JOIN TeraData.t6
ON t6.Agreement_Account_Id=t3.Agreement_Account_Id
INNER JOIN TeraData.t7 (Where=(Agreement_Summary_Date='31DEC2022'd))
ON t3.Agreement_Account_Id=t7.Agreement_Account_Id
Left join (Select * from TeraData.t8 where Agreement_Summary_Date='31DEC2022'd)
ON t3.Agreement_Account_Id=t8.Agreement_Account_Id
;
quit;
Is the libref TERADATA point to something in a TERADATA database?
I suspect that the use of the WHERE= dataset option confused PROC SQL Into thinking that it could not push all of the query into the remote database.
If you really want to get better control over the performance then you can use explicit passthru into teradata. (Note you will have to use Teradata syntax for things like date literals).
proc sql;
connect using teradata;
create table Way1 as
SELECT * from connection to teradata
(select *
FROM Schema.t1
INNER JOIN Schema.t2
ON t2.Team_IP=t1.Team_IP
INNER JOIN Schema.t3
ON t3.Branch_Cust_IP=t1.Branch_Cust_IP
LEFT JOIN Schema.t4
ON t3.Agreement_Account_Id= t4.Agreement_Account_Id
LEFT JOIN Schema.t5
ON t5.Agreement_Account_Id=t3.Agreement_Account_Id
LEFT JOIN Schema.t6
ON t6.Agreement_Account_Id=t3.Agreement_Account_Id
INNER JOIN Schema.t7
ON t3.Agreement_Account_Id=t7.Agreement_Account_Id
Left join (Select * from Schema.t8 where Agreement_Summary_Date= date'2022-12-31') t8
ON t3.Agreement_Account_Id=t8.Agreement_Account_Id
where t7.Agreement_Summary_Date= date'2022-12-31'
AND t3.Agreement_Category_Code IN (3)
AND t3.Loan_Type_Code IN (10,20)
)
;
quit;
According to this SAS white paper:
http://support.sas.com/resources/papers/teradata.pdf
the use of dataset options disqualifies a query from being passed to the DB (page 14).
Leave the WHERE where it is.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.