Solved: Left outer join using HASH - only rows from the left table

Matos · Posted 05-20-2020 02:55 AM

Dear all,

I was trying to execute a proc sql left join in order to display only the rows from the left table that do not match with the right table (conceptually: select * from TableA A Left Join TableB B on A.key=B.Key where B.key Is Null). The problem is my table A is huge, and I am using a key with 4 columns, and as such the proc sql takes a lot of time. If possible, I would like to get the same results using HASH since I understood it can have a very good performance for this kind of things. Any help appreciated.

Example with TableA and TableB (key Col1, Col2, Col3, Col4):

TableA
Col1	Col2	Col3	Col4	Col5
A	1	6	1	2
A	2	4	1	2
B	1	5	1	2
A	2	4	1	2
C	1	5	1	2
A	1	5	1	2
D	6	1	1	2

TableB
Col1	Col2	Col3	Col4
A	1	5	1
B	1	5	1
D	6	1	1
E	7	1	1

Desired result
Col1	Col2	Col3	Col4	Col5
A	1	6	1	2
A	2	4	1	2
A	2	4	1	2
C	1	5	1	2

PeterClemmensen · Posted 05-20-2020 03:02 AM

Try this

data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;

data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey ("Col1", "Col2", "Col3", "Col4");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;

The DATA to DATA Step Macro
Blog: SASnrd

View solution in original post

PeterClemmensen · Posted 05-20-2020 03:02 AM

Try this

data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;

data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey ("Col1", "Col2", "Col3", "Col4");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;

The DATA to DATA Step Macro
Blog: SASnrd

Matos · Posted 05-20-2020 03:20 AM

Apparently it works and seems incredible fast. Thanks a lot.

PeterClemmensen · Posted 05-20-2020 03:22 AM

Anytime. If your sample data is representative, then you can simplify a bit by

data want;
    if _N_ = 1 then do;
        declare hash h (dataset : "TableB");
        h.definekey (all : "Y");
        h.definedone ();
    end;

    set TableA;

    if h.check() ne 0;
run;

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 05-20-2020 03:06 AM

Also, how huge is 'huge'? And what about the size of TableB?

The DATA to DATA Step Macro
Blog: SASnrd

Matos · Posted 05-20-2020 03:15 AM

Table A has around 4 million records and table B around 2 million records. I wanted to exclude B from A if B exists in A.

PeterClemmensen · Posted 05-20-2020 03:19 AM

Did you try my code?

The DATA to DATA Step Macro
Blog: SASnrd

Matos · Posted 05-20-2020 03:23 AM

Great! It seems to work. I will just perform some checks.

Left outer join using HASH - only rows from the left table

Re: Left outer join using HASH - only rows from the left table

Re: Left outer join using HASH - only rows from the left table

Re: Left outer join using HASH - only rows from the left table

Re: Left outer join using HASH - only rows from the left table

Re: Left outer join using HASH - only rows from the left table

Re: Left outer join using HASH - only rows from the left table

Re: Left outer join using HASH - only rows from the left table

Re: Left outer join using HASH - only rows from the left table

SAS Innovate 2025: Call for Content