BookmarkSubscribeRSS Feed
jeffgreen
Fluorite | Level 6
 
6 REPLIES 6
Kurt_Bremser
Super User

See this:

data want;
set primary;
length y 8;
if _n_ = 1
then do;
  declare hash s (dataset:"secondary");
  s.definekey("number");
  s.definedata("y");
  s.definedone();
  declare hash t (dataset:"tertiary");
  t.definekey("number");
  t.definedata("y");
  t.definedone();
end;
if s.find() ne 0 then if t.find() ne 0 then y = .;
run;
jeffgreen
Fluorite | Level 6

Thank you so much! Worked perfectly. 

Just a couple of questions- what does the 8 in: length y 8 represent?

Also, if I were to add a keep statement to the start of your code, something like;

data want;

set primary;

keep number value;

 

How would I do the rest of the code? Thanks so much for your help!

Kurt_Bremser
Super User

@jeffgreen wrote:

Thank you so much! Worked perfectly. 

Just a couple of questions- what does the 8 in: length y 8 represent?

 

I like to explicitly set the attributes of variables that will come in from the hash; in the actual case, defining y as numeric with the default storage of 8 bytes would not be necessary, as SAS would do that per default.

But if you had y as character in secondary and tertiary, the step would fail without an explicit definition of y as such.

 

Also, if I were to add a keep statement to the start of your code, something like;

data want;

set primary;

keep number value;

 

How would I do the rest of the code? Thanks so much for your help!


A KEEP statement in the data step will influence what goes to the output (y would not appear there); if you want to restrict the variables coming in from primary, use a KEEP= dataset option in the SET statement.

andreas_lds
Jade | Level 19

How many obs are in the second and third dataset you have? If they are large, out of memory conditions could occur. That could be avoided by sorting and merging the datasets.

jeffgreen
Fluorite | Level 6

Pretty small! They are just the cards statement I used in the post. Will keep that in mind for future reference though, thank you!

Kurt_Bremser
Super User

Old coders like us, who are used to the scarcity of RAM, tend to over-estimate the memory consumption of a hash.

Even in the fairly restricted environment I have set up on our company's data warehouse (MEMSIZE=512M for workspace servers), we can routinely use hash objects with millions of entries.

e.g. a UUID key and a number will need 24 bytes of raw data per item, so a million will come up to 24M + the space needed for the hash table, fairly below the available memory.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 952 views
  • 3 likes
  • 3 in conversation