Well, I thought this would take me forever but I somehow nailed it on the first pass although I might be overlooking something. Anyway here's the solution - sorry for the copy/paste cluter. This may not be the absolute best case to demonstrate hash of hashes but nonetheless it worked out alright. In the event that your data is already sorted by ID but not subsequently by SCORE (as then using a first. logic and retained counter would be far more efficient than hashing), it is possible to simplify it a lot by reusing constantly the same inner/hinner and using the .clear() or .delete() and a new declare for each by group (can be managed with first. and last.). It would obviously significantly reduce memory consumption since you only keep a small subset of data in hash and output it through a loop with the hiterator upon last.ID type of logic. As for the rationale why it can be worthwhile to learn hashing hashes, if you have data with many data points where a large subset of the variables retain the same values, by using hash of hashes you actually allocate memory only once for all the data in the outer hash and then only the varying portion however small n times in the inner hash object. So for instance, if you had pacemaker data (bear with me here I don't work with health statistics but it's the first thing that came up to mind) with millions of timestamps with some data elements about vital signs provided by the pacemaker at each such timestamp but then you needed to carry tons of invariants or little variants (Age sex marital s tatus working status etc.) as well as some slightly more variant data like medical visits and the additional health measuers done at each such visits. Well you could play with 3 hashes, the outer having the age/sex/etc, the middle one with the medical visits data and ther inner one with timestamps+pacemaker data. So you might have 1M timestamps total for a single person but you only use memory once for all of his invariants, say 12 times (once a year) for his little variants and then the 1M timestamps with just a few variables taking up bulk memory. On top of that, this allows you to do some funky data manipulation as you have 3 layers of searches. For instance, if you have twins that have a specific twin_id, you might be interested in adding a 4rth layer of hashing with just twin ID as key and data and loop over twin pairs to track events and then use the fourth (inner most) hash object to see if the twin had a similar event shortly before or shortly after the timestamp where it occured on the first twin. Anyway I'm digressing here but even though they are quite complex to code and especially hard to transfer or exchange with colleagues as Hashing is still foreign grounds for most SAS end-users, there are many niches left to be explored where hash objects and in particular the capacity to store other objects as data in hash objects can significantly improve quality of life. %Let n_highest=3; %let hashexp=%sysfunc(ceil(%sysfunc(sqrt(&n_highest.)))); Data have; Input ID $ Score; Datalines; A.B. 10 A.B. 10 A.B. 5 A.B. 8 A.B. 10 A.B. 7 A.B. 23 A.B. 10 K.L. 9 K.L. 12 K.L. 11 K.L. 11 K.L. 11 K.L. 2 K.L. 9 K.L. 7 ; Run; data want2; length id $8. score 8. counter 8.; If _n_=1 Then Do; declare hash inner; /* not yet instanciated */ declare hiter hinner; /* not yet instanciated */ declare hash outer(Ordered:'a', multidata: 'n'); /* load*/ declare hiter houter('outer'); /* hash iterator object declared on hash object HT */ outer.DefineKey('ID'); outer.DefineData('ID','inner', 'hinner'); outer.DefineDone(); end; set have end=last; if outer.find() NE 0 then do; /* ID does not exist, create its own new hash object and iterator to track scores */ inner = _new_ hash(ordered: 'y', multidata: 'n', hashexp: &hashexp.); /* instanciating / multidata:'n' is default but for clarity, we use the counter variable instead of an additional hash to mimic NUM_KEYS attribute */ hinner = _new_ hiter('inner'); /* instanciating */ inner.definekey('score'); inner.definedata('score', 'counter'); inner.definedone(); outer.add(); /* add the ID and the related inner objects to the outer hash object */ counter=1; /* Initiate inner counter variable */ inner.add(); /* add the score and counter to the inner object */ end; else do; /* Else, an inner object already exists for this ID */ if inner.find()=0 then do; /* If the score exists, increment its counter variable and replace. Otherwise, add it and handle the possibility of 4 distinct score now existing in the object */ counter=counter+1; inner.replace(); end; else do; counter=1; inner.add(); if inner.num_items>&n_highest. then do; hinner.first(); /* set pointer to first aka lowest score due to sorting order */ rc=hinner.prev(); /* Clear the pointer in an idiotic way so that lowest key can be removed - this way inner hiterators will also have null pointers for the output at the end so noneed to do special handling of .first() instead of .next() to reinitialize */ rc=inner.remove(); end; end; end; /* hashes logic is built, now we need a method to output all of this data. */ if last then do; do while(houter.next()=0); /* loop on all IDs */ do while(hinner.next()=0); /* loop on all scores for that ID*/ do i=1 to counter; /* use our counter variable to recreate the appropriate number of records for an ID/Score pair */ output; end; end; end; end; drop rc i counter; run; Cheers and wish you great holidays! Vince
... View more