BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Matos
Obsidian | Level 7

Dear all,

 

I was trying to execute a proc sql left join in order to display only the rows from the left table that do not match with the right table (conceptually:  select * from TableA A Left Join TableB B on A.key=B.Key where B.key Is Null). The problem is my table A is huge, and I am using a key with 4 columns, and as such the proc sql takes a lot of time. If possible, I would like to get the same results using HASH since I understood it can have a very good performance for this kind of things. Any help appreciated.

 

Example with TableA and TableB (key Col1, Col2, Col3, Col4):

 
 

TableA

Col1Col2Col3Col4Col5
A1612
A2412
B1512
A2412
C1512
A1512
D6112

 

TableB
Col1Col2Col3Col4
A151
B151
D611
E711

 

Desired result    
Col1Col2Col3Col4Col5
A1612
A2412
A2412
C1512

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Try this

 

data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;

data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey ("Col1", "Col2", "Col3", "Col4");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

Try this

 

data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;

data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey ("Col1", "Col2", "Col3", "Col4");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;
Matos
Obsidian | Level 7

Apparently it works and seems incredible fast. Thanks a lot.

PeterClemmensen
Tourmaline | Level 20

Anytime. If your sample data is representative, then you can simplify a bit by

 

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey (all : "Y");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;
PeterClemmensen
Tourmaline | Level 20

Also, how huge is 'huge'? And what about the size of TableB?

Matos
Obsidian | Level 7

Table A has around 4 million records and table B around 2 million records. I wanted to exclude B from A if B exists in A.

Matos
Obsidian | Level 7

Great! It seems to work. I will just perform some checks.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 830 views
  • 2 likes
  • 2 in conversation