Imagine that you have two tables:
- One master table that contains all the data you need. It has a PK called CUST_NO.
- One control table that determines what data should never be included in queries. It only has one column, also called CUST_NO.
I want to use an SQL Join transformation in DI Studio that should extract all the rows from the master table, except for the ones with a CUST_NO that matches the CUST_NO from the control table.
How do I perform this query most efficiently? I've tried the two following approaches, but I suspect something is wrong, because they never seem to finish.
1. MASTER.CUST_NO NOT IN (SELECT DISTINCT CUST_NO FROM EXCLUDE)
2. MASTER.CUST_NO ^= EXCLUDE.CUST_NO
Thanks for your time.
1. Should work fine. You don't need DISTINCT. You should define an index on EXCLUDE.CUST_NO.
2. Cannot work. The condition is true for all values of EXCLUDE.CUST_NO or for all values of EXCLUDE.CUST_NO except one.
PG
Thanks, I've removed the DISTINCT and created an index. Hopefully it'll finish this time.
Btw, for some reason DI Studio autoformats this:
MASTER.CUST_NO NOT IN (SELECT CUST_NO FROM EXCLUDE)
Into this:
MASTER.CUST_NO NOT IN (SELECT EXCLUDE.CUST_NO FROM EXCLUDE)
It seems a bit odd that the table name EXCLUDE is repeated like that. Does it matter? If I remove the FROM, I get a syntax error.
The reformated version may seem redundant but it is perfectly equivalent. - PG
You're into the soup of SQL query optimization. I'm not sure if your version will minimize the explosion, but if it doesn't this might work better:
select m.* from master m left join exclude e on(m.cust_no = e.cust_no) where e.cust_no is null
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.