Paul Dorfman, Don Henderson
Aggregating or combining large data volumes can challenge computing resources. For example, the process may be hindered by the system limits on utility space or memory and, as a result, either fail or run too long to be useful. It is a natural inclination to try solving the problem by segregating the input records into a number of smaller segments, processing them independently and combining the results. However, in order for such a divide-and-conquer tactic to work, two seemingly contradictory criteria must be met: First, to aggregate or combine the data correctly, no segment can share its key values with the rest; and second, the segments must be more or less equal in size. In this presentation, we show how a hash function can be used to achieve it for arbitrary input with no prior knowledge of the distribution of the key values among its records. Effectively, the method renders any task of aggregating or combining data of any size doable by splitting its input into a large enough number of segments. Such an approach can be used to process the segments sequentially or in-parallel. The trade-off is the need to partially re-read the data. However, it is a rather small price to pay for making a failing or endlessly running task finish on time.
Watch the authors present this topic -- Uniform Hashing of Arbitrary Input Into Key-Exclusive Segments -- on the SAS Users YouTube channel.
The early bird rate has been extended! Register by March 18 for just $695 - $100 off the standard rate.
Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.