Dear all,
I was trying to execute a proc sql left join in order to display only the rows from the left table that do not match with the right table (conceptually: select * from TableA A Left Join TableB B on A.key=B.Key where B.key Is Null). The problem is my table A is huge, and I am using a key with 4 columns, and as such the proc sql takes a lot of time. If possible, I would like to get the same results using HASH since I understood it can have a very good performance for this kind of things. Any help appreciated.
Example with TableA and TableB (key Col1, Col2, Col3, Col4):
TableA | ||||
Col1 | Col2 | Col3 | Col4 | Col5 |
A | 1 | 6 | 1 | 2 |
A | 2 | 4 | 1 | 2 |
B | 1 | 5 | 1 | 2 |
A | 2 | 4 | 1 | 2 |
C | 1 | 5 | 1 | 2 |
A | 1 | 5 | 1 | 2 |
D | 6 | 1 | 1 | 2 |
TableB | |||
Col1 | Col2 | Col3 | Col4 |
A | 1 | 5 | 1 |
B | 1 | 5 | 1 |
D | 6 | 1 | 1 |
E | 7 | 1 | 1 |
Desired result | ||||
Col1 | Col2 | Col3 | Col4 | Col5 |
A | 1 | 6 | 1 | 2 |
A | 2 | 4 | 1 | 2 |
A | 2 | 4 | 1 | 2 |
C | 1 | 5 | 1 | 2 |
Try this
data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;
data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;
data want;
if _N_ = 1 then do;
declare hash h (dataset : "TableB");
h.definekey ("Col1", "Col2", "Col3", "Col4");
h.definedone ();
end;
set TableA;
if h.check() ne 0;
run;
Try this
data TableA;
input Col1 $ Col2 Col3 Col4 Col5;
datalines;
A 1 6 1 2
A 2 4 1 2
B 1 5 1 2
A 2 4 1 2
C 1 5 1 2
A 1 5 1 2
D 6 1 1 2
;
data TableB;
input Col1 $ Col2 Col3 Col4;
datalines;
A 1 5 1
B 1 5 1
D 6 1 1
E 7 1 1
;
data want;
if _N_ = 1 then do;
declare hash h (dataset : "TableB");
h.definekey ("Col1", "Col2", "Col3", "Col4");
h.definedone ();
end;
set TableA;
if h.check() ne 0;
run;
Apparently it works and seems incredible fast. Thanks a lot.
Anytime. If your sample data is representative, then you can simplify a bit by
data want;
if _N_ = 1 then do;
declare hash h (dataset : "TableB");
h.definekey (all : "Y");
h.definedone ();
end;
set TableA;
if h.check() ne 0;
run;
Also, how huge is 'huge'? And what about the size of TableB?
Table A has around 4 million records and table B around 2 million records. I wanted to exclude B from A if B exists in A.
Did you try my code?
Great! It seems to work. I will just perform some checks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.