Help using Base SAS procedures

SQL for selection of all non matching rows from 2 tables

Accepted Solution Solved
Reply
Respected Advisor
Posts: 3,887
Accepted Solution

SQL for selection of all non matching rows from 2 tables

Hi all,

I'm currently having a brain blockage struggling with defining a SQL which returns all rows from both tables with no matching key to the other table.

Capture.PNG

Sample data

data have1;

  var1='1';

  do key=1,2,3,4;

    output;

  end;

  stop;

run;

data have2;

  var2=2;

  do key=1,4,5,6;

    output;

  end;

  stop;

run;

Desired result

Key1Key2Var1Var2
21
31
52
62

It would be easy enough to use a SAS Merge but I'm dealing with two big Oracle tables (>100M rows each) so this needs to run in-database - and the job is on the critical path so I need something which performs as good as possible.

I can create additional indexes on the source tables if required - so assume such indexes exist.

Really would appreciate some guidance with this.

Environment is:

SAS 9.4 under 64bit Linux

Oracle 11c

Thanks,

Patrick


Accepted Solutions
Solution
‎05-18-2015 05:39 AM
Super User
Super User
Posts: 7,392

Re: SQL for selection of all non matching rows from 2 tables

Hi,

Perhaps this visual + code will help.

Capture.PNG

View solution in original post


All Replies
Solution
‎05-18-2015 05:39 AM
Super User
Super User
Posts: 7,392

Re: SQL for selection of all non matching rows from 2 tables

Hi,

Perhaps this visual + code will help.

Capture.PNG

Respected Advisor
Posts: 3,887

Re: SQL for selection of all non matching rows from 2 tables

Code option for...
Capture.PNG

...seems to do the trick without creating a Cartesian product. I'll try this one then unless someone suggest something even faster - cursor would be acceptable.

I have quite a bit of freedom of how I implement as long as I don't change table structures (which are part of a SAS solution used by OoTB processes).

proc sql;

  select coalesce(have1.key,have2.key) as key, have1.var1, have2.var2

  from have1 full outer join have2

  on have1.key=have2.key

  where have1.key is NULL or have2.key is NULL

  ;

quit;

Contributor
Posts: 44

Re: SQL for selection of all non matching rows from 2 tables

This is Proc SQL code you need to modify it for oracle. I think in Oracle SQL it is minus instead of except.

Proc SQL;

  create table result as

  (select h1.key as key1,

            h1.var1 as var1,

           . as var2

   from have1 h1

   except

  select h1.key as key1,

            h1.var1 as var1,

            . as var2

   from have1 h1,

          have2 h2

   where h1.key = h2.key)

   union

   (select h2.key as key1,

              ' ' as var1,

              h2.var2 as var2

     from have2 h2

     except

     select h2.key as key1,

               ' ' as var1,

               h2.var2 as var2

     from have1 h1,

            have2 h2

     where h1.key = h2.key);

quit;

Respected Advisor
Posts: 3,887

Re: SQL for selection of all non matching rows from 2 tables

I would expect the Outer Join approach to perform better. What's your take on this?

Super User
Posts: 9,671

Re: SQL for selection of all non matching rows from 2 tables

OK. Mr Patrick.M ,



data have1;
  var1='1';
  do key=1,2,3,4;
    output;
  end;
  stop;
run;
 
data have2;
  var2=2;
  do key=1,4,5,6;
    output;
  end;
  stop;
run;

proc sql;
 create table want as
 select *
  from have1 
   where key not in (select key from have2)
outer union  corr
 select *
  from have2 
   where key not in (select key from have1)  ;
quit;




Xia Keshan

Respected Advisor
Posts: 3,887

Re: SQL for selection of all non matching rows from 2 tables

Thanks

That's the coding option I had initially in mind but then felt that this means multiple passes through the data. I will run an "explain" for what you suggest and the "Outer Join" approach and see where Oracle tells me the cost is lower. I can't really test against real data yet as my development environment doesn't have the real data volumes (and also the hardware is very different to what it will be in Prod).

Super User
Posts: 9,671

Re: SQL for selection of all non matching rows from 2 tables

Patrick,

Why not ask Oracle Admin to do that , since you want to process these tables in Oracle .

If you can process them in SAS, I suggest to use Hash Table .

Xia Keshan

Respected Advisor
Posts: 3,887

Re: SQL for selection of all non matching rows from 2 tables

Hi

Thank you for your time and support.

What I will do for now is to take the Full Outer Join approach as this seems to work and is the "documented" coding approach. So that's how I'm going to implement first.

I then will take your advice and involve the Oracle DBA asking to performance tune the SQL in case this job should remain on the critical path and really be long running.

Pulling the data into SAS and do stuff there is absolutely no option as the tables I'm dealing with have more than 100M rows each but the delta will be less than 1% - and I need the result set in Oracle and not SAS for further processing.

Thanks,

Patrick

Super User
Posts: 9,671

Re: SQL for selection of all non matching rows from 2 tables

An alternative code , see which one would be faster .  I guess it is me,since you are asking vertically union not horizontally union .Smiley Happy

proc sql;

create table want as

select *

  from have1 as a

   where not exists(select * from have2 where key=a.key)

outer union  corr

select *

  from have2 as b

   where not exists(select * from have1 where key=b.key) ;

quit;

Xia Keshan

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1017 views
  • 4 likes
  • 4 in conversation