BookmarkSubscribeRSS Feed
Vince28_Statcan
Quartz | Level 8

Hi,

I've been working with hash objects and reading about the topic for quite a while now but I have a math background and not an IT one and I fail to see how the following contradiction could've been managed handled.

HASHEXP allows the user to set the number of hash buckets and the efficiency of a hash table comes largely from distributing the keys uniformely over the hash buckets as finding the bucket is O(1) whereas collision handling is O(log_2(N/#buckets)) where N.

Now according to Paul Dorfman's

http://www2.sas.com/proceedings/sugi28/004-28.pdf

The natural approach to a hash function would be the division function which is said have good properties (in terms of uniformity of distribution over buckets) if the number of buckets is a prime number distant from a power of 2. At the same time, HASHEXP allocates a power of 2 buckets (2^hashexp). I couldn't find any documentation on what specific underlying hash function was implemented by SAS but it feels contradictory to specify a power of 2 number of hash buckets if I wish to get an O(1) lookup table. Don't get me wrong, since it is all in-memory, even an o(log_2(N)) binary tree where badluck put everything in the same bucket and the binary tree collision has to be ran through for every record is probably going to save me time over the I/O intensive sorts but I am going to do a small presentation about the hash tool to colleagues in a few weeks and while I expect the mass never to have used it as we normally don't have access to internet at our workstations (joy of NSOs) and thus many colleges seem to have lagged behind over the new SAS features but at the same time, still, I expect to have a few people in the audience with far more knowledge than me on the IT/SAS intelligence side and wish at least not to look like a fool if specific questions about the hash function come up.

Thanks,

Vincent

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 0 replies
  • 625 views
  • 0 likes
  • 1 in conversation