- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to understand item_size in HASH. It's the size of what ? Is it the number of bytes of all the key, all the variables, both ? The docum is not really explicite...So, I ran this code
data work.stock;
input prod $1-10 qty 12-14;
datalines;
broccoli 345
corn 389
potato 993
onion 730
;
data _null_;
if _N_ = 1 then do;
length prod $10;
/* Declare hash object and read STOCK data set as ordered */
declare hash myhash(dataset: "work.stock");
/* Define key and data variables */
myhash.defineKey('prod');
myhash.defineData('qty');
myhash.defineDone();
end;
/* Add a key and data value to the hash object */
prod = 'celery';
qty = 183;
rc = myhash.add();
/* Use ITEM_SIZE to return the size of the item in hash object */
itemsize = myhash.item_size;
put itemsize=;
run;
itemsize=40
is written to the SAS log.
But when I run in SAS University, it's more 64
How come it's not 40 like the example ?
And it looks like the input of the first data step is very wrong !!! It nullifies the second variable.
The codes supplied in the SAS docum... is there someone who test it before it is published on SAS site ?
Hoping to hear from someone
Have a nice day
michel.jubinville@hotmail.ca
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My first guess would be variations between the machines that run the code and the bytes used by default objects.
Consider this part of the documentation you link to:
The ITEM_SIZE attribute does not reflect the initial overhead that the hash object requires, nor does it take into account any necessary internal alignments.
So if your system requires a bit more "overhead" or "internal alignments" that would be one cause.
FWIMBW, I also get 64. The documentation doesn't say what OS or such may have been involved.
In the case of "close enough for government work", I'm not sure that 40 vs 64 is greatly significant.
Now if you saw 640 that would raise an eyebrow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I agree that the data step looks like an error.
The Item_Size attribute returns the length of a hash object entry expressed in bytes. I don't know if it is OS/Version specific. Maybe @DonH or @hashman can clarify (authors of the hash object bible Data Management Solutions Using SAS Hash Table Operations). I get 64 bytes as well.
I do think that the example is a bit clumpsy and does not add much to the understanding of the attribute. For example, the Add() Method does not do anything regarding Item_Size. A more fitting example could be the data step below. Play around with the lengths of the host variables k and d and see how the item_size attribute change.
data _null_;
declare hash h();
h.defineKey('k');
h.definedata('d');
h.defineDone();
length k 8 d $ 100;
i = h.item_size;
put i=;
run;
And to answer your last question: "is there someone who test it before it is published on SAS site".. The answer is yes 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The fact that item_size returns its result in bytes is not system-specific. But the result is: For example, item_size is generally shorter under 32-bit Windows than on 64-bit. It also depends on the key or data type (numeric or character).
As to the reason why item_size may be longer than the total number of bytes in the key and data portions, my understanding is that it accounts for extra memory needed for internal bookkeeping.
@chrisz has written a fantastic macro routine covering all item_size eventualities for practically any combination of the length and scalar data types (i.e. character or numeric) and formulated item_size rules depending on these factors. I haven't seen any exceptions from these rules. The routine does not cover non-scalar types, such as the pointers to hash objects and/or iterators that can be stored in the data portion of a "hash-of-hashes" hash table. My limited experimentation into it has shown that the effect of having such piece of data in the data portion is equivalent to adding a (scalar) numeric hash variable.
Kind regards
Paul D.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
And there is also a minimum length to consider. I modified the program to use _NEW_ to create the hash instances so I could redefine the same hash object with more variables. Note that the minimum length on my Windows 64 bit machine is 48 - and it does not increase by just adding a new variable to the data portion.
1 data _null_;
2 k = . ;
3 array d(*) $1 d1-d9;
4 declare hash h();
5 h = _new_ hash();
6 h.defineKey('k');
7 h.definedata('d1');
8 h.defineDone();
9 OneDataVar = h.item_size;
10 put OneDataVar=;
11 h.delete();
12
13 h = _new_ hash();
14 h.defineKey('k');
15 h.definedata('d1','d2','d3','d4','d5','d6','d7','d8');
16 h.defineDone();
17 EightDataVars = h.item_size;
18 put EightDataVars=;
19 h.delete();
20
21 h = _new_ hash();
22 h.defineKey('k');
23 h.definedata('d1','d2','d3','d4','d5','d6','d7','d8','d9');
24 h.defineDone();
25 NineDataVars = h.item_size;
26 put NineDataVars=;
27 h.delete();
28
29 run;
OneDataVar=48
EightDataVars=48
NineDataVars=64
NOTE: DATA statement used (Total process time):
real time 0.07 seconds
cpu time 0.04 seconds
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. So, one variable and you get 48, eight variables didn't change it, and nine get 64. One would think that each block of 8 variables is 48 if the length of all variables stay the same as in your example. My concern about the item_size attribute was to know if a certain file could be contained in a fixed memory space. Seems like it may vary from systems and I'd have to assume a certain overhead. Well, it looks like it's impossible to get the right size easily.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is not hard to get the right size. Just define a hash table using a data set with the right variables and length and get the item size attribute.
It is not just the number of variables. It also depends on their type/length.
And it is repeatable.
IIRC (and @hashman can correct me if I am wrong as this is something he researched for our hash object SAS Press book), but if you concern is calculating how much memory is needed across different OS's, the OS dependency is primarily an issue for narrow hash object tables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. My comprehension is that, according to your reply, if the reported size is more than the bytes really used by the key and the data, I won't be able to compare that size with the space available in memory. I kept all answers to my post and I'll find time to experiment a bit. I'll try to find these rules you're talking about.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I took note of the book, Chris. I plan to buy it after I've finished the 2 other ones. Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Oops, the right Chris ... sorry!
To reiterate, a fantastic hash size calc.
Kind regards
Paul D.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I'll give it a try. And thanks for the mention of the book. I will search to buy it from Amazon.