SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Question about troublesome SQL Join

Reply
Frequent Contributor
Posts: 89

Question about troublesome SQL Join

Imagine that you have two tables:

- One master table that contains all the data you need. It has a PK called CUST_NO.

- One control table that determines what data should never be included in queries. It only has one column, also called CUST_NO.

I want to use an SQL Join transformation in DI Studio that should extract all the rows from the master table, except for the ones with a CUST_NO that matches the CUST_NO from the control table.

How do I perform this query most efficiently? I've tried the two following approaches, but I suspect something is wrong, because they never seem to finish.

1. MASTER.CUST_NO NOT IN (SELECT DISTINCT CUST_NO FROM EXCLUDE)

2. MASTER.CUST_NO ^= EXCLUDE.CUST_NO

Thanks for your time. Smiley Happy

Respected Advisor
Posts: 4,646

Re: Question about troublesome SQL Join

1. Should work fine. You don't need DISTINCT.  You should define an index on EXCLUDE.CUST_NO.

2. Cannot work. The condition is true for all values of EXCLUDE.CUST_NO or for all values of EXCLUDE.CUST_NO except one.

PG

PG
Frequent Contributor
Posts: 89

Re: Question about troublesome SQL Join

Thanks, I've removed the DISTINCT and created an index. Hopefully it'll finish this time.

Btw, for some reason DI Studio autoformats this:

MASTER.CUST_NO NOT IN (SELECT CUST_NO FROM EXCLUDE)

Into this:

MASTER.CUST_NO NOT IN (SELECT EXCLUDE.CUST_NO FROM EXCLUDE)


It seems a bit odd that the table name EXCLUDE is repeated like that. Does it matter? If I remove the FROM, I get a syntax error.

Respected Advisor
Posts: 4,646

Re: Question about troublesome SQL Join

The reformated version may seem redundant but it is perfectly equivalent. - PG

PG
PROC Star
Posts: 1,093

Re: Question about troublesome SQL Join

You're into the soup of SQL query optimization. I'm not sure if your version will minimize the explosion, but if it doesn't this might work better:

select m.* from master m left join exclude e on(m.cust_no = e.cust_no) where e.cust_no is null

Ask a Question
Discussion stats
  • 4 replies
  • 227 views
  • 6 likes
  • 3 in conversation