@yabwon:
Bart, as far as elegance is concerned, I have the same kind of aesthetic aversion to "big enough" allocations. I do remember our -L exchange and am looking forward to seeing the finished product whenever you deem it to have been sufficiently Polished ;). I've learned quite a few FCMP tricks of trade from you and strongly suspect will learn more from this one.
Aesthetics aside, "big enough" can be extremely useful, as you well know, and quite productive speed-wise. This is because a one fell swoop allocation at compile time works much faster than allocating memory at run time for each extra item one at a time. With a really big hash object table, most of its time is spent not so much on the search and retrieve operations as on looking for available memory before doing an insert, especially when its memory footprint approaches the system limits. As a result, code can be run for a long time, only to fail on the next insertion when no more memory is available; whereas if an open-addressed array-based hash table is allocated as a big array at compile time, it either fails at once or else will work to the end without a glitch.
As you've seen from my -L writings, with a single integer key an open-addressed array-based hash table wins hands down both in insert/search speed (~1:2) and memory usage (~1:3 even with a half-sparse table). It also excels at aggregation because it can be done directly in the corresponding array cells (rather than via the FIND-aggregate-REPLACE hash object cycle). This is because the hash function in this case is the simple mod(key,table_size). The hash object readily takes over in terms of search speed for character keys and composite keys of mixed types since against such keys, its internal hash function is much faster than what can be mustered via:
mod (input (md5 (catx (k1-kN), pib6.), table_size)
externally - this combo just takes too long to compute, and a slow hash function defeats the purpose of using hashing in the first place. If SAS surfaced a really tight and fast function that could be used against an arbitrary key, array-based hashing would be much more attractive.
Kind regards
Paul D.
p.s. If you want to see how a hash object table fares against an open-addressed array table with a single integer key, try the program in the attached file.
... View more