See this:
data want;
set primary;
length y 8;
if _n_ = 1
then do;
declare hash s (dataset:"secondary");
s.definekey("number");
s.definedata("y");
s.definedone();
declare hash t (dataset:"tertiary");
t.definekey("number");
t.definedata("y");
t.definedone();
end;
if s.find() ne 0 then if t.find() ne 0 then y = .;
run;
Thank you so much! Worked perfectly.
Just a couple of questions- what does the 8 in: length y 8 represent?
Also, if I were to add a keep statement to the start of your code, something like;
data want;
set primary;
keep number value;
How would I do the rest of the code? Thanks so much for your help!
@jeffgreen wrote:
Thank you so much! Worked perfectly.
Just a couple of questions- what does the 8 in: length y 8 represent?
I like to explicitly set the attributes of variables that will come in from the hash; in the actual case, defining y as numeric with the default storage of 8 bytes would not be necessary, as SAS would do that per default.
But if you had y as character in secondary and tertiary, the step would fail without an explicit definition of y as such.
Also, if I were to add a keep statement to the start of your code, something like;
data want;
set primary;
keep number value;
How would I do the rest of the code? Thanks so much for your help!
A KEEP statement in the data step will influence what goes to the output (y would not appear there); if you want to restrict the variables coming in from primary, use a KEEP= dataset option in the SET statement.
How many obs are in the second and third dataset you have? If they are large, out of memory conditions could occur. That could be avoided by sorting and merging the datasets.
Pretty small! They are just the cards statement I used in the post. Will keep that in mind for future reference though, thank you!
Old coders like us, who are used to the scarcity of RAM, tend to over-estimate the memory consumption of a hash.
Even in the fairly restricted environment I have set up on our company's data warehouse (MEMSIZE=512M for workspace servers), we can routinely use hash objects with millions of entries.
e.g. a UUID key and a number will need 24 bytes of raw data per item, so a million will come up to 24M + the space needed for the hash table, fairly below the available memory.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.