When joining multiple tables in EG, I noticed the code generated uses a 'where' option when the all the types are 'inner' . If there's atleast a 'Left' (or 'Right') join then the SQL generated does use a 'Join' rather than a 'Where' logic which is significantly faster. Is there anyway to force EG to generate a 'Join' logic rather than a 'Where' for all inner joins?
Thanks
The two queries are sorting the output differently. Can you test if that explains the performance difference?
Not sure I understand the differences you are trying to describe. Can you provide examples?
I am not sure if it really matters as I suspect that the different methods might both end up doing the same underlying steps because PROC SQL optimizers your queries.
Here are the two SQL Statements:
SAS EG generated code as seen in 'Preview'
PROC SQL;
CREATE TABLE tbl_output AS
SELECT DISTINCT t1.keyID AS keyID,
t2.EventID AS EventID,
t2.Date_Begn,
/* YY_QQ */
(DATEPART(t2.Date_Begn)) FORMAT=YYQD6. LABEL="YY_QQ" AS YY_QQ,
t3.Field1 AS t3Field1,
t5.Field1 AS t5Field1,
t4.Field1,
t4.Field2,
t2.status_value
FROM dbo.table1 t1, dbo.table2 t2, dbo.table3 t3, dbo.table4 t4,
dbo.table5 t5
WHERE (t1.EventID = t2.case_id AND t2.ntype_id = t3.EventID AND t2.EventID = t4.key2 AND t4.t4key = t5.EventID) AND
(t2.Date_Begn >= "&begdt:00:00"dt AND t2.Date_Begn < "&enddt:00:00"dt AND t5.Field1 IN
(
'Value 1',
'Value 2',
) AND t2.status_value = 'Complete' AND t3.Field1 = 'Distinct')
ORDER BY t1.keyID, t2.EventID, t2.Date_Begn, t3.Field1, t5.Field1, t4.Field1, t4.Field2, t1.status_value;
QUIT;
The one below is another SQL code that runs 1/10th the time the above takes
PROC SQL;
CREATE TABLE tbl_output AS
SELECT distinct t1.keyID AS keyID,
t2.EventID AS EventID,
t2.Date_Begn,
(DATEPART(t2.Date_Begn)) FORMAT=YYQD6. LABEL="YY_QQ" AS YY_QQ,
t3.Field1 AS t3Field1,
t5.Field1 AS t5Field1,
t4.Field1,
t4.Field2,
t2.status_value
FROM dbo.table1 t1 join dbo.table2 t2 on
t1.EventID = t2.case_id
join dbo.table3 t3 on
t2.ntype_id = t3.EventID
join dbo.table4 t4 on
t2.EventID = t4.key2
join dbo.table5 t5 on
t4.t4key = t5.EventID
WHERE t2.Date_Begn >= "&s_date:00:00"dt AND t2.Date_Begn < "&e_date:00:00"dt AND
t5.Field1 IN ("Value 1","Value 2") AND
t2.status_value = 'Complete' AND
t3.Field1 = 'Distinct'
ORDER BY t2.Date_Begn, t3.Field1, t5.Field1, t4.Field1, t4.Field2, t1.status_value;
QUIT;
As I mentioned in my first post, EG combines the table in a 'where' if the type of join is all 'inner' . How can I make it come up with the second so it runs faster? I can ofcourse stick the second in a program within EG but that defeats the whole purpose of using EG to begin with
The two queries are sorting the output differently. Can you test if that explains the performance difference?
Yes that did it Tom Ive gone blind-Thanks so much!
Let's throw this one over the wall to SAS. I noticed a while ago how EG uses different SQL join syntax depending on the join type.
Can anyone from SAS tell us why the two different syntax forms are used?
Thanks, Tom
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.