01-07-2013 09:31 AM
Imagine that you have two tables:
- One master table that contains all the data you need. It has a PK called CUST_NO.
- One control table that determines what data should never be included in queries. It only has one column, also called CUST_NO.
I want to use an SQL Join transformation in DI Studio that should extract all the rows from the master table, except for the ones with a CUST_NO that matches the CUST_NO from the control table.
How do I perform this query most efficiently? I've tried the two following approaches, but I suspect something is wrong, because they never seem to finish.
1. MASTER.CUST_NO NOT IN (SELECT DISTINCT CUST_NO FROM EXCLUDE)
2. MASTER.CUST_NO ^= EXCLUDE.CUST_NO
Thanks for your time.
01-07-2013 09:49 AM
1. Should work fine. You don't need DISTINCT. You should define an index on EXCLUDE.CUST_NO.
2. Cannot work. The condition is true for all values of EXCLUDE.CUST_NO or for all values of EXCLUDE.CUST_NO except one.
01-07-2013 10:07 AM
Thanks, I've removed the DISTINCT and created an index. Hopefully it'll finish this time.
Btw, for some reason DI Studio autoformats this:
MASTER.CUST_NO NOT IN (SELECT CUST_NO FROM EXCLUDE)
MASTER.CUST_NO NOT IN (SELECT EXCLUDE.CUST_NO FROM EXCLUDE)
It seems a bit odd that the table name EXCLUDE is repeated like that. Does it matter? If I remove the FROM, I get a syntax error.
01-07-2013 12:22 PM
You're into the soup of SQL query optimization. I'm not sure if your version will minimize the explosion, but if it doesn't this might work better:
select m.* from master m left join exclude e on(m.cust_no = e.cust_no) where e.cust_no is null