Dear all,
I was trying to execute a proc sql left join in order to display only the rows from the left table that do not match with the right table (conceptually: select * from TableA A Left Join TableB B on A.key=B.Key where B.key Is Null). The problem is my table A is huge, and I am using a key with 4 columns, and as such the proc sql takes a lot of time. If possible, I would like to get the same results using HASH since I understood it can have a very good performance for this kind of things. Any help appreciated.
Example with TableA and TableB (key Col1, Col2, Col3, Col4):
TableA | ||||
Col1 | Col2 | Col3 | Col4 | Col5 |
A | 1 | 6 | 1 | 2 |
A | 2 | 4 | 1 | 2 |
B | 1 | 5 | 1 | 2 |
A | 2 | 4 | 1 | 2 |
C | 1 | 5 | 1 | 2 |
A | 1 | 5 | 1 | 2 |
D | 6 | 1 | 1 | 2 |
TableB | |||
Col1 | Col2 | Col3 | Col4 |
A | 1 | 5 | 1 |
B | 1 | 5 | 1 |
D | 6 | 1 | 1 |
E | 7 | 1 | 1 |
Desired result | ||||
Col1 | Col2 | Col3 | Col4 | Col5 |
A | 1 | 6 | 1 | 2 |
A | 2 | 4 | 1 | 2 |
A | 2 | 4 | 1 | 2 |
C | 1 | 5 | 1 | 2 |
Try this
data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;
data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;
data want;
if _N_ = 1 then do;
declare hash h (dataset : "TableB");
h.definekey ("Col1", "Col2", "Col3", "Col4");
h.definedone ();
end;
set TableA;
if h.check() ne 0;
run;
Try this
data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;
data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;
data want;
if _N_ = 1 then do;
declare hash h (dataset : "TableB");
h.definekey ("Col1", "Col2", "Col3", "Col4");
h.definedone ();
end;
set TableA;
if h.check() ne 0;
run;
Apparently it works and seems incredible fast. Thanks a lot.
Anytime. If your sample data is representative, then you can simplify a bit by
data want;
if _N_ = 1 then do;
declare hash h (dataset : "TableB");
h.definekey (all : "Y");
h.definedone ();
end;
set TableA;
if h.check() ne 0;
run;
Also, how huge is 'huge'? And what about the size of TableB?
Table A has around 4 million records and table B around 2 million records. I wanted to exclude B from A if B exists in A.
Did you try my code?
Great! It seems to work. I will just perform some checks.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.