Does anyone have links to a good beginning tutorial on Hash tables? I have been googling and reading a lot, but I find that the papers, so far, are fairly specific and vague enough that I find it difficult to understand the overall structure. I have a dataset with 530 million observations and 250+ columns of sensor data (~ 3 TB). The powers that be want stats summaries on ALL of the columns (n, min, max, mean, stddev, skew, kurtosis, var) by equipment id by date. Being new to SAS, I did a lot of research and it appears that hash tables would be the best approach but there are several aspects to the programming that are not clear to me. My initial approach (and please direct me if there is a better approach) is to use the hash tables to subset the data by id (or id/date) and then proc summary on the subset. I tried running the hash subset and ran out of memory (Win7 8GB memory). data hash_results; set myLargeDataset; if (_n_ eq 1) then do; declare hash a(dataset:'myLargeDataset'); a.defineKey('equipmentsernum', 'Date'); a.defineData(all:'y'); a.defineDone(); end; equipmentsernum = '296737'; if(a.find() eq 0); run; This code works on a subset of myLargeDataset, but on the big set, it quickly runs out of memory. Some things I haven't figured out with hash tables are: 1) Can I save the resulting hash table to re-use outside of the data step? 2) Can I write a macro to loop through the hash? My thought was to use the hash table to subset myLargeDataset into a smaller table of just one serial number or id, the call proc summary to get stats for that unit, then loop through the next serial number...etc. Any hash tutorials or pointers would be greatly appreciated. Regards, Fred
... View more