BookmarkSubscribeRSS Feed
Vince28_Statcan
Quartz | Level 8

Hi,

I've been working with hash objects and reading about the topic for quite a while now but I have a math background and not an IT one and I fail to see how the following contradiction could've been managed handled.

HASHEXP allows the user to set the number of hash buckets and the efficiency of a hash table comes largely from distributing the keys uniformely over the hash buckets as finding the bucket is O(1) whereas collision handling is O(log_2(N/#buckets)) where N.

Now according to Paul Dorfman's

http://www2.sas.com/proceedings/sugi28/004-28.pdf

The natural approach to a hash function would be the division function which is said have good properties (in terms of uniformity of distribution over buckets) if the number of buckets is a prime number distant from a power of 2. At the same time, HASHEXP allocates a power of 2 buckets (2^hashexp). I couldn't find any documentation on what specific underlying hash function was implemented by SAS but it feels contradictory to specify a power of 2 number of hash buckets if I wish to get an O(1) lookup table. Don't get me wrong, since it is all in-memory, even an o(log_2(N)) binary tree where badluck put everything in the same bucket and the binary tree collision has to be ran through for every record is probably going to save me time over the I/O intensive sorts but I am going to do a small presentation about the hash tool to colleagues in a few weeks and while I expect the mass never to have used it as we normally don't have access to internet at our workstations (joy of NSOs) and thus many colleges seem to have lagged behind over the new SAS features but at the same time, still, I expect to have a few people in the audience with far more knowledge than me on the IT/SAS intelligence side and wish at least not to look like a fool if specific questions about the hash function come up.

Thanks,

Vincent

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 0 replies
  • 693 views
  • 0 likes
  • 1 in conversation