BookmarkSubscribeRSS Feed

Generating Sample Data with the Hash Object

Started ‎09-18-2020 by
Modified ‎08-26-2018 by
Views 2,177

Generating data has a number of use cases, for example:

  • generating test cases
  • generating volume data for performance testing
  • and so on.

For our book Data Management Solutions Using SAS® Hash Table Operations: A Business Intelligence Case Study @hashman and @DonH we needed to generate the sample data for the book. Choosing sample data can be challenging. If you use data that is industry or subject matter dependent, users in other industries have trouble relating (or occasionally dismiss it out of hand). For that reason we decided to use sports related data and choose baseball, in part because @DonH is a baseball geek. There is lots of data collected about baseball games and baseball fans are very focused on the analytics of baseball (referred to as sabermetrics).


We were unable to use the XML data for Major League Baseball so we decided to generate data for a complete season of a game we came to call Bizarro Ball. Bizarro Ball is similar to baseball, but it has some bizarre rules that are different, thus the name.


We used the hash object in many of the programs to generate the data. During the technical review of the book, we got feedback that describing how we generated the data was interesting, but did not seem to fit the Data Management and Business Intelligence theme of the book. So we decided to not include those details in the book; and instead document them externally.


Given that generating data is of broader interest that just what we needed to do for our sample data, we decided that the series of articles we had planned to write might be of interest to SAS users other than those folks who are interested in the book and want to generate different sample data.


This article will be updated as we write the additional articles that talk about our general approaches (use of random numbers, random selection, parameter files, what to parameterize vs. what to hard-code, and so on).


So please follow this article if you are interested in being notified about the followon articles that address these topics in more detail.

Version history
Last update:
‎08-26-2018 01:44 PM
Updated by:



Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags