BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Patrick
Opal | Level 21

Hi all,

I'm currently having a brain blockage struggling with defining a SQL which returns all rows from both tables with no matching key to the other table.

Capture.PNG

Sample data

data have1;

  var1='1';

  do key=1,2,3,4;

    output;

  end;

  stop;

run;

data have2;

  var2=2;

  do key=1,4,5,6;

    output;

  end;

  stop;

run;

Desired result

Key1Key2Var1Var2
21
31
52
62

It would be easy enough to use a SAS Merge but I'm dealing with two big Oracle tables (>100M rows each) so this needs to run in-database - and the job is on the critical path so I need something which performs as good as possible.

I can create additional indexes on the source tables if required - so assume such indexes exist.

Really would appreciate some guidance with this.

Environment is:

SAS 9.4 under 64bit Linux

Oracle 11c

Thanks,

Patrick

1 ACCEPTED SOLUTION

Accepted Solutions
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Perhaps this visual + code will help.

Capture.PNG

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

Perhaps this visual + code will help.

Capture.PNG

Patrick
Opal | Level 21

Code option for...
Capture.PNG

...seems to do the trick without creating a Cartesian product. I'll try this one then unless someone suggest something even faster - cursor would be acceptable.

I have quite a bit of freedom of how I implement as long as I don't change table structures (which are part of a SAS solution used by OoTB processes).

proc sql;

  select coalesce(have1.key,have2.key) as key, have1.var1, have2.var2

  from have1 full outer join have2

  on have1.key=have2.key

  where have1.key is NULL or have2.key is NULL

  ;

quit;

AskoLötjönen
Quartz | Level 8

This is Proc SQL code you need to modify it for oracle. I think in Oracle SQL it is minus instead of except.

Proc SQL;

  create table result as

  (select h1.key as key1,

            h1.var1 as var1,

           . as var2

   from have1 h1

   except

  select h1.key as key1,

            h1.var1 as var1,

            . as var2

   from have1 h1,

          have2 h2

   where h1.key = h2.key)

   union

   (select h2.key as key1,

              ' ' as var1,

              h2.var2 as var2

     from have2 h2

     except

     select h2.key as key1,

               ' ' as var1,

               h2.var2 as var2

     from have1 h1,

            have2 h2

     where h1.key = h2.key);

quit;

Patrick
Opal | Level 21

I would expect the Outer Join approach to perform better. What's your take on this?

Ksharp
Super User

OK. Mr Patrick.M ,



data have1;
  var1='1';
  do key=1,2,3,4;
    output;
  end;
  stop;
run;
 
data have2;
  var2=2;
  do key=1,4,5,6;
    output;
  end;
  stop;
run;

proc sql;
 create table want as
 select *
  from have1 
   where key not in (select key from have2)
outer union  corr
 select *
  from have2 
   where key not in (select key from have1)  ;
quit;




Xia Keshan

Patrick
Opal | Level 21

Thanks

That's the coding option I had initially in mind but then felt that this means multiple passes through the data. I will run an "explain" for what you suggest and the "Outer Join" approach and see where Oracle tells me the cost is lower. I can't really test against real data yet as my development environment doesn't have the real data volumes (and also the hardware is very different to what it will be in Prod).

Ksharp
Super User

Patrick,

Why not ask Oracle Admin to do that , since you want to process these tables in Oracle .

If you can process them in SAS, I suggest to use Hash Table .

Xia Keshan

Patrick
Opal | Level 21

Hi

Thank you for your time and support.

What I will do for now is to take the Full Outer Join approach as this seems to work and is the "documented" coding approach. So that's how I'm going to implement first.

I then will take your advice and involve the Oracle DBA asking to performance tune the SQL in case this job should remain on the critical path and really be long running.

Pulling the data into SAS and do stuff there is absolutely no option as the tables I'm dealing with have more than 100M rows each but the delta will be less than 1% - and I need the result set in Oracle and not SAS for further processing.

Thanks,

Patrick

Ksharp
Super User

An alternative code , see which one would be faster .  I guess it is me,since you are asking vertically union not horizontally union .Smiley Happy

proc sql;

create table want as

select *

  from have1 as a

   where not exists(select * from have2 where key=a.key)

outer union  corr

select *

  from have2 as b

   where not exists(select * from have1 where key=b.key) ;

quit;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 9539 views
  • 4 likes
  • 4 in conversation