- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When joining multiple tables in EG, I noticed the code generated uses a 'where' option when the all the types are 'inner' . If there's atleast a 'Left' (or 'Right') join then the SQL generated does use a 'Join' rather than a 'Where' logic which is significantly faster. Is there anyway to force EG to generate a 'Join' logic rather than a 'Where' for all inner joins?
Thanks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The two queries are sorting the output differently. Can you test if that explains the performance difference?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not sure I understand the differences you are trying to describe. Can you provide examples?
I am not sure if it really matters as I suspect that the different methods might both end up doing the same underlying steps because PROC SQL optimizers your queries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Here are the two SQL Statements:
SAS EG generated code as seen in 'Preview'
PROC SQL;
CREATE TABLE tbl_output AS
SELECT DISTINCT t1.keyID AS keyID,
t2.EventID AS EventID,
t2.Date_Begn,
/* YY_QQ */
(DATEPART(t2.Date_Begn)) FORMAT=YYQD6. LABEL="YY_QQ" AS YY_QQ,
t3.Field1 AS t3Field1,
t5.Field1 AS t5Field1,
t4.Field1,
t4.Field2,
t2.status_value
FROM dbo.table1 t1, dbo.table2 t2, dbo.table3 t3, dbo.table4 t4,
dbo.table5 t5
WHERE (t1.EventID = t2.case_id AND t2.ntype_id = t3.EventID AND t2.EventID = t4.key2 AND t4.t4key = t5.EventID) AND
(t2.Date_Begn >= "&begdt:00:00"dt AND t2.Date_Begn < "&enddt:00:00"dt AND t5.Field1 IN
(
'Value 1',
'Value 2',
) AND t2.status_value = 'Complete' AND t3.Field1 = 'Distinct')
ORDER BY t1.keyID, t2.EventID, t2.Date_Begn, t3.Field1, t5.Field1, t4.Field1, t4.Field2, t1.status_value;
QUIT;
The one below is another SQL code that runs 1/10th the time the above takes
PROC SQL;
CREATE TABLE tbl_output AS
SELECT distinct t1.keyID AS keyID,
t2.EventID AS EventID,
t2.Date_Begn,
(DATEPART(t2.Date_Begn)) FORMAT=YYQD6. LABEL="YY_QQ" AS YY_QQ,
t3.Field1 AS t3Field1,
t5.Field1 AS t5Field1,
t4.Field1,
t4.Field2,
t2.status_value
FROM dbo.table1 t1 join dbo.table2 t2 on
t1.EventID = t2.case_id
join dbo.table3 t3 on
t2.ntype_id = t3.EventID
join dbo.table4 t4 on
t2.EventID = t4.key2
join dbo.table5 t5 on
t4.t4key = t5.EventID
WHERE t2.Date_Begn >= "&s_date:00:00"dt AND t2.Date_Begn < "&e_date:00:00"dt AND
t5.Field1 IN ("Value 1","Value 2") AND
t2.status_value = 'Complete' AND
t3.Field1 = 'Distinct'
ORDER BY t2.Date_Begn, t3.Field1, t5.Field1, t4.Field1, t4.Field2, t1.status_value;
QUIT;
As I mentioned in my first post, EG combines the table in a 'where' if the type of join is all 'inner' . How can I make it come up with the second so it runs faster? I can ofcourse stick the second in a program within EG but that defeats the whole purpose of using EG to begin with
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The two queries are sorting the output differently. Can you test if that explains the performance difference?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Yes that did it Tom Ive gone blind-Thanks so much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let's throw this one over the wall to SAS. I noticed a while ago how EG uses different SQL join syntax depending on the join type.
Can anyone from SAS tell us why the two different syntax forms are used?
Thanks, Tom