Hi, I have a large dataset of trading data that looks like this: data have; infile datalines truncover dlm=',' dsd; input cusip:$char9. trade_date:mmddyy10. execution_time:time8. buy_sell:$char1. (quantity execution_price trade_id) (:best32.); format execution_time time8. trade_date mmddyy10.; datalines; 010831BW4,09/12/2013,12:30:44,B,45,103.133,1325534595 010831BW4,09/12/2013,14:56:13,B,500,103.303,1325536968 010831BW4,09/17/2013,13:14:35,S,25,105.244,1326014939 010831BW4,09/17/2013,15:40:12,S,50,105.369,1326017794 010831BW4,09/17/2013,16:04:21,S,25,105.244,1326018222 010831BW4,09/18/2013,10:12:59,S,50,105.369,1326121154 010831BW4,09/18/2013,11:50:21,S,350,105.015,1326123005 010831BW4,09/20/2013,11:00:41,B,50,106.308,1326341521 010831BW4,09/20/2013,11:00:41,S,50,106.308,1326341522 010831BY0,09/12/2013,12:30:44,B,1075,99.592,1325534596 010831BY0,09/12/2013,16:51:56,B,350,99.777,1325547689 010831BY0,09/13/2013,11:01:07,S,20,100.25,1325642219 010831BY0,09/13/2013,11:03:05,S,330,100.44,1325642259 010831BY0,10/02/2013,11:07:37,B,50,101.117,1327522086 010831BY0,10/03/2013,11:54:05,S,50,104.218,1327632245 ; run; I am trying to match the buy observations with the sell observations (distinguished by the Buy_Sell variable) within each CUSIP using a Last-in Last-out methodology. In other words, each time there is a sale in a CUSIP, I want the quantity most recently purchased in that CUSIP matched to the sale. If more is bought than sold (or sold than bought), then after the final buy or sell observation (whichever comes first) I want the extra quantity discarded and for the program to move on to the next CUSIP. I want my final table to look like this: CUSIP Quantity b_id bdate btime bprice s_id sdate stime sprice 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326014939 17-Sep-13 13:14:35 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326018222 17-Sep-13 16:04:21 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326121154 18-Sep-13 10:12:59 105.369 010831BW4 350 1325536968 12-Sep-13 14:56:13 103.303 1326123005 18-Sep-13 11:50:21 105.015 010831BW4 50 1326341521 20-Sep-13 11:00:41 106.308 1326341522 20-Sep-13 11:00:41 106.308 010831BY0 20 1325547689 12-Sep-13 16:51:56 $99.78 1325642219 13-Sep-13 11:01:07 100.25 010831BY0 330 1325547689 12-Sep-13 16:51:56 $99.78 1325642259 13-Sep-13 11:03:05 100.44 010831BY1 50 1327522086 2-Oct-13 11:07:37 101.117 1327632245 3-Oct-13 11:54:05 104.218 In order to achieve this, I am using a hash of hashes that changes for each CUSIP and is made up of hash tables and hash iterators for the buy and sell observations. Each time there is a new CUSIP, the two hash tables take all of the buy and sell observations and output them together based on the order that they were brought into the hash table and the quantities that they have. Once the hash tables run out of buy or sell observations in a CUSIP, the additional observations are discarded and the the program moves on to the next CUSIP. Here is my current code: data want; set have; if _n_=1 then do; declare hash hoh (); hoh.definekey('cusip'); hoh.definedata('cusip','buys','sells','buy','sell'); hoh.definedone(); declare hash buys; declare hash sells; declare hiter buy; declare hiter sell; end; if hoh.find() ne 0 then do; length tid 8 b_id bdate btime bprice bquantity quantity s_id sdate stime sprice squantity 8; buys=_new_ hash(ordered:'a'); buys.defineKey('tid'); buys.defineData('bdate','btime','bprice','bquantity','b_id', 'b_account', 'b_source', 'b_cusip', 'in14'); buys.defineDone(); sells=_new_ hash(ordered:'a'); sells.defineKey('tid'); sells.defineData('sdate','stime','sprice','squantity','s_id','s_account', 's_cusip'); sells.defineDone(); buy=_new_ hiter('buys'); sell=_new_ hiter('sells'); inventory=0; end; tid+1; if buy_sell="B" then do; bdate=trade_date; btime=execution_time; bprice=execution_price; bquantity=quantity; b_id=trade_id; b_account=account_number2; b_source=source; b_cusip=cusip; in14=in14; inventory+quantity; buys.add(); end; else do; if inventory=0 then delete; if quantity>inventory then quantity=inventory; sdate=trade_date; stime=execution_time; sprice=execution_price; squantity=quantity; s_id=trade_id; s_account=account_number2; s_cusip=cusip; inventory+(-1*quantity); sells.add(); end; keep bdate b_account b_source b_cusip b_id bdate btime bprice quantity sdate stime sprice s_id s_account s_cusip in14; format bdate sdate date9. bprice sprice dollar8.4 btime stime time12.; brc=buy.last(); src=sell.first(); do while(brc=0 and src=0); select; when (bquantity < squantity) do; temp=squantity-bquantity; quantity=bquantity; output; squantity=temp; brc=buy.prev(); end; when (bquantity=squantity) do; quantity=bquantity; output; brc=buy.prev(); src=sell.next(); end; when (bquantity > squantity) do; temp=bquantity-squantity; quantity=squantity; output; bquantity=temp; src=sell.next(); end; otherwise; end; end; rc=hoh.replace(); run; When I run this code, the output is all over the place. Usually the output begins correct in each CUSIP, and gets further off as it goes along. It gets especially bad when more is bought in a CUSIP than sold and some quantity needs to be discarded. The output also contains many observations that are full duplicates of previous observations, and uses observations even after their quantity has been used up and the hash table was supposed to move on to the next observation. For the dataset above, this is the output that my code gives: CUSIP Quantity b_id bdate btime bprice s_id sdate stime sprice 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326014939 17-Sep-13 13:14:35 105.244 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326014939 17-Sep-13 13:14:35 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326014939 17-Sep-13 13:14:35 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326018222 17-Sep-13 16:04:21 105.244 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326014939 17-Sep-13 13:14:35 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326018222 17-Sep-13 16:04:21 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326121154 18-Sep-13 10:12:59 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326014939 17-Sep-13 13:14:35 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326018222 17-Sep-13 16:04:21 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326121154 18-Sep-13 10:12:59 105.369 010831BW4 350 1325536968 12-Sep-13 14:56:13 103.303 1326123005 18-Sep-13 11:50:21 105.015 010831BW4 25 1326341521 20-Sep-13 11:00:41 106.308 1326014939 17-Sep-13 13:14:35 105.244 010831BW4 25 1326341521 20-Sep-13 11:00:41 106.308 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326018222 17-Sep-13 16:04:21 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326121154 18-Sep-13 10:12:59 105.369 010831BW4 350 1325536968 12-Sep-13 14:56:13 103.303 1326123005 18-Sep-13 11:50:21 105.015 010831BW4 25 1326341521 20-Sep-13 11:00:41 106.308 1326014939 17-Sep-13 13:14:35 105.244 010831BW4 25 1326341521 20-Sep-13 11:00:41 106.308 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326017794 17-Sep-13 15:40:12 105.369 010831BW4 25 1325536968 12-Sep-13 14:56:13 103.303 1326018222 17-Sep-13 16:04:21 105.244 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326121154 18-Sep-13 10:12:59 105.369 010831BW4 350 1325536968 12-Sep-13 14:56:13 103.303 1326123005 18-Sep-13 11:50:21 105.015 010831BW4 50 1325536968 12-Sep-13 14:56:13 103.303 1326341522 20-Sep-13 11:00:41 106.308 010831BY0 20 1325547689 12-Sep-13 16:51:56 $99.78 1325642219 13-Sep-13 11:01:07 100.25 010831BY0 20 1325547689 12-Sep-13 16:51:56 $99.78 1325642219 13-Sep-13 11:01:07 100.25 010831BY0 330 1325547689 12-Sep-13 16:51:56 $99.78 1325642259 13-Sep-13 11:03:05 100.44 010831BY0 20 1327522086 2-Oct-13 11:07:37 101.117 1325642219 13-Sep-13 11:01:07 100.25 010831BY0 30 1327522086 2-Oct-13 11:07:37 101.117 1325642259 13-Sep-13 11:03:05 100.44 010831BY0 300 1325547689 12-Sep-13 16:51:56 $99.78 1325642259 13-Sep-13 11:03:05 100.44 010831BY0 20 1327522086 2-Oct-13 11:07:37 101.117 1325642219 13-Sep-13 11:01:07 100.25 010831BY0 30 1327522086 2-Oct-13 11:07:37 101.117 1325642259 13-Sep-13 11:03:05 100.44 010831BY0 300 1325547689 12-Sep-13 16:51:56 $99.78 1325642259 13-Sep-13 11:03:05 100.44 010831BY0 50 1325547689 12-Sep-13 16:51:56 $99.78 1327632245 3-Oct-13 11:54:05 104.218 As you can see, it has many more observations than it should - some full duplicates of previous observations, some that are repeats of previous values on only the Buy or Sell side. I think the problem is somewhere in how I iterate through my hash table, but I have been unable to find the bug. Thank you for reading through and please let me know if you can figure out what I am doing wrong!
... View more