BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Matos
Obsidian | Level 7

Dear all,

 

I was trying to execute a proc sql left join in order to display only the rows from the left table that do not match with the right table (conceptually:  select * from TableA A Left Join TableB B on A.key=B.Key where B.key Is Null). The problem is my table A is huge, and I am using a key with 4 columns, and as such the proc sql takes a lot of time. If possible, I would like to get the same results using HASH since I understood it can have a very good performance for this kind of things. Any help appreciated.

 

Example with TableA and TableB (key Col1, Col2, Col3, Col4):

 
 

TableA

Col1Col2Col3Col4Col5
A1612
A2412
B1512
A2412
C1512
A1512
D6112

 

TableB
Col1Col2Col3Col4
A151
B151
D611
E711

 

Desired result    
Col1Col2Col3Col4Col5
A1612
A2412
A2412
C1512

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Try this

 

data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;

data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey ("Col1", "Col2", "Col3", "Col4");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

Try this

 

data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;

data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey ("Col1", "Col2", "Col3", "Col4");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;
Matos
Obsidian | Level 7

Apparently it works and seems incredible fast. Thanks a lot.

PeterClemmensen
Tourmaline | Level 20

Anytime. If your sample data is representative, then you can simplify a bit by

 

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey (all : "Y");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;
PeterClemmensen
Tourmaline | Level 20

Also, how huge is 'huge'? And what about the size of TableB?

Matos
Obsidian | Level 7

Table A has around 4 million records and table B around 2 million records. I wanted to exclude B from A if B exists in A.

Matos
Obsidian | Level 7

Great! It seems to work. I will just perform some checks.