SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mij
Fluorite | Level 6 Mij
Fluorite | Level 6

Hi,

I'm trying to understand item_size in HASH. It's the size of what ? Is it the number of bytes of all the key, all the variables, both ? The docum is not really explicite...So, I ran this code

https://documentation.sas.com/?docsetId=lecompobjref&docsetTarget=p195co8u1s7a91n1xv1get0544t3.htm&d...

data work.stock;
   input prod $1-10 qty 12-14;
   datalines;
broccoli 345
corn 389
potato 993
onion 730
;
data _null_;
   if _N_ = 1 then do;
      length prod $10;
   /* Declare hash object and read STOCK data set as ordered */
      declare hash myhash(dataset: "work.stock");
      /* Define key and data variables */
      myhash.defineKey('prod');
      myhash.defineData('qty');
      myhash.defineDone();
   end;
   /* Add a key and data value to the hash object */
   prod = 'celery';
   qty = 183;
   rc = myhash.add();
   
   /* Use ITEM_SIZE to return the size of the item in hash object */
   itemsize = myhash.item_size;
   put itemsize=;
run;

itemsize=40 is written to the SAS log.

But when I run in SAS University, it's more 64

How come it's not 40 like the example ?

And it looks like the input of the first data step is very wrong !!!    It nullifies the second variable.

The codes supplied in the SAS docum... is there someone who test it before it is published on SAS site ?

Hoping to hear from someone

Have a nice day

michel.jubinville@hotmail.ca

1 ACCEPTED SOLUTION

Accepted Solutions
Mij
Fluorite | Level 6 Mij
Fluorite | Level 6
Thanks. I know now that it's the size of all the hash object, key and data. It seems to be something that I'll have to test everywhere I work. I ask myself if it's something that may matters. Knowing what space is available in memory to contain an hash object is what matters most.

View solution in original post

14 REPLIES 14
ballardw
Super User

My first guess would be variations between the machines that run the code and the bytes used by default objects.

Consider this part of the documentation you link to:

The ITEM_SIZE attribute does not reflect the initial overhead that the hash object requires, nor does it take into account any necessary internal alignments.

So if your system requires a bit more "overhead" or "internal alignments" that would be one cause.

FWIMBW, I also get 64. The documentation doesn't say what OS or such may have been involved.

In the case of "close enough for government work", I'm not sure that 40 vs 64 is greatly significant.

Now if you saw 640 that would raise an eyebrow.

 

 

Mij
Fluorite | Level 6 Mij
Fluorite | Level 6
Thanks. I know now that it's the size of all the hash object, key and data. It seems to be something that I'll have to test everywhere I work. I ask myself if it's something that may matters. Knowing what space is available in memory to contain an hash object is what matters most.
PeterClemmensen
Tourmaline | Level 20

I agree that the data step looks like an error. 

 

The Item_Size attribute returns the length of a hash object entry expressed in bytes. I don't know if it is OS/Version specific. Maybe @DonH or @hashman can clarify (authors of the hash object bible Data Management Solutions Using SAS Hash Table Operations). I get 64 bytes as well. 

 

I do think that the example is a bit clumpsy and does not add much to the understanding of the attribute. For example, the Add() Method does not do anything regarding Item_Size. A more fitting example could be the data step below. Play around with the lengths of the host variables k and d and see how the item_size attribute change.

 

data _null_;
   declare hash h();
   h.defineKey('k');
   h.definedata('d');
   h.defineDone();

   length k 8 d $ 100;
   
   i = h.item_size;
   put i=;
run;

And to answer your last question: "is there someone who test it before it is published on SAS site".. The answer is yes 🙂

 

hashman
Ammonite | Level 13

@PeterClemmensen:

   The fact that item_size returns its result in bytes is not system-specific. But the result is: For example, item_size is generally shorter under 32-bit Windows than on 64-bit. It also depends on the key or data type (numeric or character).     

   As to the reason why item_size may be longer than the total number of bytes in the key and data portions, my understanding is that it accounts for extra memory needed for internal bookkeeping.    

   @chrisz has written a fantastic macro routine covering all item_size eventualities for practically any combination of the length and scalar data types (i.e. character or numeric) and formulated item_size rules depending on these factors. I haven't seen any exceptions from these rules. The routine does not cover non-scalar types, such as the pointers to hash objects and/or iterators that can be stored in the data portion of a "hash-of-hashes" hash table. My limited experimentation into it has shown that the effect of having such piece of data in the data portion is equivalent to adding a (scalar) numeric hash variable.     

 

Kind regards

Paul D.

DonH
Lapis Lazuli | Level 10

And there is also a minimum length to consider. I modified the program to use _NEW_ to create the hash instances so I could redefine the same hash object with more variables. Note that the minimum length on my Windows 64 bit machine is 48 - and it does not increase by just adding a new variable to the data portion.

1    data _null_;
2       k = . ;
3       array d(*) $1 d1-d9;
4       declare hash h();
5       h = _new_ hash();
6       h.defineKey('k');
7       h.definedata('d1');
8       h.defineDone();
9       OneDataVar = h.item_size;
10      put OneDataVar=;
11      h.delete();
12
13      h = _new_ hash();
14      h.defineKey('k');
15      h.definedata('d1','d2','d3','d4','d5','d6','d7','d8');
16      h.defineDone();
17      EightDataVars = h.item_size;
18      put EightDataVars=;
19      h.delete();
20
21      h = _new_ hash();
22      h.defineKey('k');
23      h.definedata('d1','d2','d3','d4','d5','d6','d7','d8','d9');
24      h.defineDone();
25      NineDataVars = h.item_size;
26      put NineDataVars=;
27      h.delete();
28
29   run;

OneDataVar=48
EightDataVars=48
NineDataVars=64
NOTE: DATA statement used (Total process time):
      real time           0.07 seconds
      cpu time            0.04 seconds
Mij
Fluorite | Level 6 Mij
Fluorite | Level 6

Thanks. So, one variable and you get 48, eight variables didn't change it, and nine get 64. One would think that each block of 8 variables is 48 if the length of all variables stay the same as in your example. My concern about the item_size attribute was to know if a certain file could be contained in a fixed memory space. Seems like it may vary from systems and I'd have to assume a certain overhead. Well, it looks like it's impossible to get the right size easily.

DonH
Lapis Lazuli | Level 10

It is not hard to get the right size. Just define a hash table using a data set with the right variables and length and get the item size attribute.
It is not just the number of variables. It also depends on their type/length.

And it is repeatable. 

IIRC (and @hashman can correct me if I am wrong as this is something he researched for our hash object SAS Press book), but if you concern is calculating how much memory is needed across different OS's, the OS dependency is primarily an issue for narrow hash object tables.

Mij
Fluorite | Level 6 Mij
Fluorite | Level 6
I guess you're Don Henderson. I'm about to buy the book you wrote with Dorfman. I guess you had some lines about item_size. Looks like a complete book.
Mij
Fluorite | Level 6 Mij
Fluorite | Level 6

Thanks. My comprehension is that, according to your reply, if the reported size is more than the bytes really used by the key and the data, I won't be able to compare that size with the space available in memory. I kept all answers to my post and I'll find time to experiment a bit. I'll try to find these rules you're talking about.

ChrisNZ
Tourmaline | Level 20

@hashman Wrong Chris! 😉

@Mij Here is the calculator, together with explanations.

Mij
Fluorite | Level 6 Mij
Fluorite | Level 6

I took note of the book, Chris. I plan to buy it after I've finished the 2 other ones. Thanks

hashman
Ammonite | Level 13

@ChrisNZ:

Oops, the right Chris ... sorry!

To reiterate, a fantastic hash size calc.

 

Kind regards

Paul D. 

ChrisNZ
Tourmaline | Level 20

> To reiterate, a fantastic hash size calc.

You started it @hashman 🙂

 

Thank you for your trust @Mij 

 

Mij
Fluorite | Level 6 Mij
Fluorite | Level 6

Thanks. I'll give it a try. And thanks for the mention of the book. I will search to buy it from Amazon.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2559 views
  • 14 likes
  • 6 in conversation