I would vote for using a hash table to implement this. Without sample data I can't really supply sample code, but you can look about on the web; Paul Dorfman has written a few hundred papers on the subject, among others. Basically load the cancellation table into a hash table, then as you go through the data step query the hash table, and if you find a cancellation then both mark that dataset record as cancelled, and remove the record in the hash table. You have to be a little careful here if you might have duplicate cancellations, which I assume is a possibility; you probably need 9.2 or newer to do this properly (or even 9.3, I forget when they added the more advanced treatment of duplicate records in a hash table). This requires the ability of the cancellation table to fit entirely into memory, by the way, so be aware of that if it's really big.
... View more