Hi SAS communities,
I've heard the power of "in-memory" table look-up presented by hash tables.
Having no backgrounds in programming, I still didn't understand the advantage of hash tables after reading a bunch of sas papers online about hash table.
Now I come across a problem in SAS coding and just wondering whether this problem can be solved by hash tables.
I'm soliciting general ideas, so no detailed sas code and data sets are available for me to provide. I have no idea how to write hash table code on this problem. I'm just curious whether this problem can be solved by hash tables.
Here is the problem:
I have two datasets A and B. Both datasets have the same variables, for example, acct and revenue. The only difference is that dataset A has 1milion observations and dataset B has 1milllion plus 20 observations that are not in dataset A.
I want to find who these 20 additional observations are in dataset B that can not be found in dataset A.
I write a proc sql code like the following to approach this problem:
proc sql;
select *
from datasetB
where acct_key not in ( select distinct acct_key from datasetA);
quit;
The problem is that, with 1million and 1million+20 observations in each dataset, this piece of code takes insane amount of time for SAS to finish the job. My sas even crushes once and runs forever. Wondering how Hash table can help in this case?? How can the magical "in-memory table look up" function possessed by hash table help me to cut down the running time and do the job more efficiently?
Thanks in advance to all dear sas community members. I'm indebted to your thoughts/inputs/ideas/suggestions.
Sorry, the key variable should be in quotation marks like this. How about that?
data want;
if 0 then set A;
if _N_ = 1 then do;
declare hash h(dataset:'A');
h.defineKey('acct_key');
h.defineDone();
end;
set B;
if h.check() ne 0;
run;
Sounds like a nice job for a hash object. As you mention, we can't see your data, so needless to say, this code is untested but see if this gives you what you want
data want;
if 0 then set A;
if _N_ = 1 then do;
declare hash h(dataset:'A');
h.defineKey(acct_key);
h.defineDone();
end;
set B;
if h.check() ne 0;
run;
Before explaining this 'magical' in-memory technique, let me know if this works for you 🙂 An absolute requirement for it to work is that the data in the hash object fits in memory.
@changxuosu, did the above solve your problem? 🙂
Sorry, the key variable should be in quotation marks like this. How about that?
data want;
if 0 then set A;
if _N_ = 1 then do;
declare hash h(dataset:'A');
h.defineKey('acct_key');
h.defineDone();
end;
set B;
if h.check() ne 0;
run;
No problem. I'm glad you found your answer. Do you have any questions about the code?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.