I have a large dataset with trading data from which I am trying to calculate profit with a FIFO (first-in first-out) methodology in Base SAS 9.4. In other words, I want the first CUSIPs bought to be matched to the first CUSIPs sold, regardless of the date. In my final dataset, I want all of the variables from the "Buy" observation of the input dataset to show up alongside all of the variables from the "Sell" observation of the input dataset. If more is bought than sold or sold than bought in a given CUSIP, then the excess should be excluded from the final dataset. Furthermore, if more is sold than has been bought at that time, then the excess sell quantity should be disregarded. A few rows of my input dataset are here:
CUSIP | Date | Time | Price | Quantity | Buy / Sell | Account | Source | Trade ID | In14 (Dummy) |
A62 | 5/6/2013 | 11:46:58 AM | 100.1 | 100 | B | 602 | 1 | 135 | 1 |
A62 | 5/6/2013 | 12:23:10 PM | 100.1 | 250 | B | 352 | 3 | 320 | 1 |
A62 | 5/6/2013 | 3:09:02 PM | 100.1 | 150 | B | 602 | 1 | 616 | 1 |
A62 | 6/7/2013 | 2:48:28 PM | 99.854 | 500 | S | 176 | 3 | 065 | 0 |
A70 | 5/6/2013 | 11:47:53 AM | 100.1 | 150 | B | 602 | 1 | 287 | 1 |
A70 | 5/6/2013 | 12:23:50 PM | 100.1 | 250 | B | 352 | 3 | 326 | 1 |
A70 | 5/6/2013 | 3:09:58 PM | 100.1 | 100 | B | 602 | 1 | 626 | 1 |
A70 | 5/14/2013 | 11:58:52 AM | 100.028 | 715 | B | 161 | 3 | 251 | 1 |
A70 | 5/20/2013 | 12:18:25 PM | 100.738 | 1,215 | S | 176 | 3 | 628 | 0 |
A88 | 5/6/2013 | 11:48:44 AM | 100.1 | 150 | B | 602 | 1 | 290 | 1 |
A88 | 5/6/2013 | 12:24:29 PM | 100.1 | 250 | B | 352 | 3 | 345 | 1 |
A88 | 5/6/2013 | 3:10:38 PM | 100.1 | 100 | B | 602 | 1 | 643 | 1 |
A88 | 5/20/2013 | 12:54:53 PM | 100.699 | 500 | S | 176 | 3 | 165 | 0 |
A96 | 5/6/2013 | 11:49:21 AM | 100.1 | 135 | B | 602 | 1 | 291 | 1 |
A96 | 5/6/2013 | 12:25:01 PM | 100.1 | 140 | B | 352 | 3 | 348 | 1 |
A96 | 6/18/2013 | 4:04:55 PM | 98.257 | 275 | S | 176 | 3 | 466 | 0 |
B20 | 5/6/2013 | 11:51:16 AM | 100.1 | 150 | B | 602 | 1 | 292 | 1 |
B20 | 5/6/2013 | 3:51:57 PM | 100.1 | 80 | B | 602 | 1 | 268 | 1 |
B20 | 6/27/2013 | 9:48:19 AM | 93.6 | 230 | S | 176 | 3 | 181 | 0 |
B38 | 5/3/2013 | 3:50:05 PM | 100.1 | 100 | B | 549 | 3 | 921 | 1 |
B38 | 5/6/2013 | 11:52:02 AM | 100.1 | 110 | B | 602 | 1 | 899 | 1 |
B38 | 5/20/2013 | 4:55:36 PM | 100.375 | 210 | S | 176 | 3 | 868 | 0 |
L55 | 3/7/2014 | 11:24:32 AM | 100.05 | 500 | B | 398 | 2 | 424 | 1 |
L55 | 3/7/2014 | 1:00:45 PM | 100 | 500 | B | 793 | 3 | 953 | 1 |
L55 | 3/11/2014 | 12:07:02 PM | 100.239 | 75 | S | 176 | 3 | 716 | 0 |
In order to create the final table I described above, I have written the following code that creates a Hash of Hashes for each CUSIP that uses two other Hash Tables, one for buys, and one for sells. The code is here:
data want;
set have;
if _n_=1 then do;
declare hash hoh ();
hoh.definekey('cusip');
hoh.definedata('cusip','buys','sells','buy','sell');
hoh.definedone();
declare hash buys;
declare hash sells;
declare hiter buy;
declare hiter sell;
end;
if hoh.find() ne 0 then do;
length tid 8 b_id bdate btime bprice bquantity quantity s_id sdate stime sprice squantity 8;
buys=_new_ hash(ordered:'a');
buys.defineKey('tid');
buys.defineData('bdate','btime','bprice','bquantity','b_id', 'b_account', 'b_source', 'b_cusip', 'in14');
buys.defineDone();
sells=_new_ hash(ordered:'a');
sells.defineKey('tid');
sells.defineData('sdate','stime','sprice','squantity','s_id','s_account', 's_cusip');
sells.defineDone();
buy=_new_ hiter('buys');
sell=_new_ hiter('sells');
inventory=0;
end;
tid+1;
if buy_sell="B" then do;
bdate=trade_date;
btime=execution_time;
bprice=execution_price;
bquantity=quantity;
b_id=trade_id;
b_account=account_number2;
b_source=source;
b_cusip=cusip;
in14=in14;
inventory+quantity;
buys.add();
end;
else do;
if inventory=0 then delete;
if quantity>inventory then quantity=inventory;
sdate=trade_date;
stime=execution_time;
sprice=execution_price;
squantity=quantity;
s_id=trade_id;
s_account=account_number2;
s_cusip=cusip;
inventory+(-1*quantity);
sells.add();
end;
keep bdate b_account b_source b_cusip b_id bdate btime bprice quantity sdate stime sprice s_id s_account s_cusip in14;
format bdate sdate date9. bprice sprice dollar8.4 btime stime time12.;
brc=buy.first();
src=sell.first();
do while(brc=0 and src=0);
select;
when (bquantity < squantity) do;
temp=squantity-bquantity;
quantity=bquantity;
output;
squantity=temp;
brc=buy.next();
end;
when (bquantity=squantity) do;
quantity=bquantity;
output;
brc=buy.next();
src=sell.next();
end;
when (bquantity > squantity) do;
temp=bquantity-squantity;
quantity=squantity;
output;
bquantity=temp;
src=sell.next();
end;
otherwise;
end;
end;
rc=hoh.replace();
run;
The problem is that I get tons of duplicate entries...my final dataset has almost 30 times the number of observations as it does once I remove duplicates. Any idea how I can stop the hash table from creating these duplicates? Once I remove the duplicates it appears to actually have correctly matched buys and sells within each CUSIP and discarded excess buy and sell quantity, but I would feel more confident about this if I wasn't getting these duplicates. Thanks so much for reading through!
I now eve see what you want is a dataset with (usually) one observation per purchased lot of shares. The obs will have the CUSIP, then all the buy transaction variables (B_data=trade_data, B_time=execution_time, etc). If that lot is sold it will also have an analogous set of sell transaction variables (s_date, s_price, s_quantity ...).
If all the shares from a lot is sold, it will have a quantity_available=0. If no shares from a lot is sold, then all the S_ vars will be missing, and that lot will have quantify_available=b_quantity. If a lot is partially sold, than it will have two records. The first will record the sold portion (and will have quantity_available=0), and the second will have the same B_ variables, and all missing S_ variables, and quantity_available=the unsold portion of the lot.
Even through I've written papers using hash-of-hashes for FIFO queues, I think you might be better off maintaining a single hash with both "b_" and "s_" vars for each stock lot, and a second hash with one data item per stock, with the current status (i.e. like total shares currently held, number of buys and sales to date).
In that case, each "buy" establishes a new stock LOT (i.e. a new data item in hash object LOTS). I.e. 3 buys of stock AAA generates three lots, identified by stock/date/price/quantity/source/ (the "b_" vars - b_date, b_price, b_quantity, b_source,...).
But when a subsequent "sell" for the same stock is encountered you want to match it with the oldest LOT not yet sold off - that's the FIFO part. So in addition to the buy-related vars noted above, the lot should also have the same set of variables (date/price/...) from the sell transactions (the "s_" vars).
As a rule:
You might have chosen hash-of-hashes so that you can easily find the oldest lot for a given stock. Since each stock has its own hash and corresponding hash iterator, then just using the first method of the iterator gets you to the oldest lot for that stock. If you have a single hash lots sorted by cusip/b_num (b_num is the sequence of the buy transaction), then instead of first method, you can use the setcur method, with arguments cusip=cusip, anb b_num=1.
Here is a working program:
proc datasets library=work noprint;
delete _: ;
quit;
Data have;
Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $1. Account_Number2 Source Trade_ID In14;
format trade_date yymmddn8. execution_time time8.0;
datalines;
A62 05/06/2013 11:46:58 100.1 100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1 250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1 150 B 602 1 616 1
A62 06/07/2013 14:48:28 99.854 500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1 150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1 250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1 100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028 715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1 150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1 250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1 100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699 500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1 135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1 140 B 352 3 348 1
A96 06/18/2013 16:04:55 98.257 275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1 150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1 80 B 602 1 268 1
B20 06/27/2013 09:48:19 93.6 230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1 100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1 110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375 210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05 500 B 398 2 424 1
L55 03/07/2014 13:00:45 100 500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239 75 S 176 3 716 0
run;
data _null_;
set have end=end_of_have;
length lot_cusip $8 /* The CUSIP var for use in the LOTS hash object */
shrs_held N_buys N_sells /* Vars for the stock_profile hash */
lot_sold /* Actual lot sales for this Sales transaction */
lot_avail /* Current share available in a LOT after sale */
8;
array trdata {*} trade_date execution_time quantity execution_price account_number2 source trade_id in14;
array b_data {*} b_date b_time b_quantity b_price b_acct b_source b_tid b_in14;
array s_data {*} s_date s_time s_quantity s_price s_acct s_source s_tid s_in14;
format b_date: s_date: yymmddn8. b_time s_time time8.0;
if _n_=1 then do;
declare hash stock_profile();
stock_profile.definekey('cusip');
stock_profile.definedata('cusip','shrs_held','N_buys','N_sells'); /*Number of Buys, Sells*/
stock_profile.definedone();
declare hash lots(ordered:'A');
lots.definekey('lot_cusip','b_num');
lots.definedata('lot_cusip','b_num','s_num');
lots.definedata('b_date','b_time','b_quantity','b_price','b_acct','b_source','b_tid','b_in14');
lots.definedata('s_date','s_time','s_quantity','s_price','s_acct','s_source','s_tid','s_in14');
lots.definedata('lot_sold','lot_avail');
lots.definedone();
declare hiter ilots('lots');
end;
rc=stock_profile.find(); /*Get current status of the incoming CUSIP */
if (rc^=0 or shrs_held in (.,0)) and buy_sell='S' then do;
if end_of_have=0 then delete; /*Avoid premature sales*/
else goto eoh;
end;
else if buy_sell='B' then do;
N_buys=sum(N_buys,1);
shrs_held=sum(shrs_held,quantity);
rc=stock_profile.replace();
lot_cusip=cusip;
b_num=n_buys;
lot_sold=0;
lot_avail=quantity;
do v=1 to dim(trdata); b_data{v}=trdata{v}; end;
rc=lots.add();
end;
else if buy_sell='S' then do;
N_sells=sum(N_sells,1);
total_shrs_to_sell=min(quantity,shrs_held);
/* Now step through the lots looking for shares availble to sell */
do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);
if lot_avail>0 then do;
orig_lot_avail=lot_avail;
s_num=n_sells;
do v=1 to dim(trdata); s_data{v}=trdata{v}; end;
lot_sold=min(lot_avail,total_shrs_to_sell);
lot_avail=0;
rc=lots.replace();
total_shrs_to_sell=total_shrs_to_sell-lot_sold; /*Update the sales "order" */
shrs_held=shrs_held-lot_sold; /*Update current holdings */
if lot_sold<orig_lot_avail then do; /*If the LOT was not emptied by this sale ...*/
b_num=b_num+.01; /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/
lot_avail=orig_lot_avail-lot_sold;
lot_sold=0;
call missing(of s_:);
rc=lots.add();
end;
end;
rc=ilots.next();
end;
rc=stock_profile.replace(); /*Store updated holdings*/
end;
if end_of_have ;
eoh:
rc=stock_profile.output(dataset:'stock_profile');
rc=lots.output(dataset:'want');
run;
notes:
I suspect this will be just as flexible as the hash-of-hashes approach, and probably takes somewhat less memory. Plus it allows you to easily do a bulk output of all the records for all the cusips.
Please provide a sample SAS DATA statement (not just a table of values) of the input and equally important, the wanted output dataset. I don't understand the description of your objective.
Sorry for the lack of information. Here is a data statement:
Data have;
Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $ Account_Number2 Source Trade_ID In14;
datalines;
A62 05/06/2013 11:46:58 100.1 100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1 250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1 150 B 602 1 616 1
A62 06/07/2013 14:48:28 99.854 500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1 150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1 250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1 100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028 715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1 150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1 250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1 100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699 500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1 135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1 140 B 352 3 348 1
A96 06/18/2013 16:04:55 98.257 275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1 150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1 80 B 602 1 268 1
B20 06/27/2013 09:48:19 93.6 230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1 100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1 110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375 210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05 500 B 398 2 424 1
L55 03/07/2014 13:00:45 100 500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239 75 S 176 3 716 0
;
run;
The objective is to match the first quantity bought with the first quantity sold in each CUSIP, disregarding any excess quantity on the buy or sell side that can't be matched. Then, the output dataset should have all of the data from the buy observations and all of the data from the sell observation matched alongside, like this:
SAS Output
135 | 06MAY2013 | 11:46:58 | 100.1000 | 100 | 65 | 07JUN2013 | 14:48:28 | $99.8540 | 1 | 602 | 1 | A62 | 176 | A62 |
320 | 06MAY2013 | 12:23:10 | 100.1000 | 250 | 65 | 07JUN2013 | 14:48:28 | $99.8540 | 1 | 352 | 3 | A62 | 176 | A62 |
616 | 06MAY2013 | 15:09:02 | 100.1000 | 150 | 65 | 07JUN2013 | 14:48:28 | $99.8540 | 1 | 602 | 1 | A62 | 176 | A62 |
287 | 06MAY2013 | 11:47:53 | 100.1000 | 150 | 628 | 20MAY2013 | 12:18:25 | 100.7380 | 1 | 602 | 1 | A70 | 176 | A70 |
326 | 06MAY2013 | 12:23:50 | 100.1000 | 250 | 628 | 20MAY2013 | 12:18:25 | 100.7380 | 1 | 352 | 3 | A70 | 176 | A70 |
626 | 06MAY2013 | 15:09:58 | 100.1000 | 100 | 628 | 20MAY2013 | 12:18:25 | 100.7380 | 1 | 602 | 1 | A70 | 176 | A70 |
251 | 14MAY2013 | 11:58:52 | 100.0280 | 715 | 628 | 20MAY2013 | 12:18:25 | 100.7380 | 1 | 161 | 3 | A70 | 176 | A70 |
290 | 06MAY2013 | 11:48:44 | 100.1000 | 150 | 165 | 20MAY2013 | 12:54:53 | 100.6990 | 1 | 602 | 1 | A88 | 176 | A88 |
345 | 06MAY2013 | 12:24:29 | 100.1000 | 250 | 165 | 20MAY2013 | 12:54:53 | 100.6990 | 1 | 352 | 3 | A88 | 176 | A88 |
643 | 06MAY2013 | 15:10:38 | 100.1000 | 100 | 165 | 20MAY2013 | 12:54:53 | 100.6990 | 1 | 602 | 1 | A88 | 176 | A88 |
291 | 06MAY2013 | 11:49:21 | 100.1000 | 135 | 466 | 18JUN2013 | 16:04:55 | $98.2570 | 1 | 602 | 1 | A96 | 176 | A96 |
348 | 06MAY2013 | 12:25:01 | 100.1000 | 140 | 466 | 18JUN2013 | 16:04:55 | $98.2570 | 1 | 352 | 3 | A96 | 176 | A96 |
292 | 06MAY2013 | 11:51:16 | 100.1000 | 150 | 181 | 27JUN2013 | 9:48:19 | $93.6000 | 1 | 602 | 1 | B20 | 176 | B20 |
268 | 06MAY2013 | 15:51:57 | 100.1000 | 80 | 181 | 27JUN2013 | 9:48:19 | $93.6000 | 1 | 602 | 1 | B20 | 176 | B20 |
921 | 03MAY2013 | 15:50:05 | 100.1000 | 100 | 868 | 20MAY2013 | 16:55:36 | 100.3750 | 1 | 549 | 3 | B38 | 176 | B38 |
899 | 06MAY2013 | 11:52:02 | 100.1000 | 110 | 868 | 20MAY2013 | 16:55:36 | 100.3750 | 1 | 602 | 1 | B38 | 176 | B38 |
424 | 07MAR2014 | 11:24:32 | 100.0500 | 75 | 716 | 11MAR2014 | 12:07:02 | 100.2390 | 1 | 398 | 2 | L55 | 176 | L55 |
Let me know if there is any other information that would be useful to know. The basic logic is to create a Hash of Hashes that changes for each CUSIP, and contains references to a Buys hash table, a Sells hash table, a Buy iterator, and a Sell iterator. The hash tables take all of the buy and sell observations, and output them together based on the order that they were brought into the hash table and the quantities that they have. Once the hash tables run out of buy or sell observations in a CUSIP, the additional observations are discarded and the the program moves on to the next CUSIP. This is largely working - just with a massive number of duplicates.
How about this one ?
Data have;
Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $ Account_Number2 Source Trade_ID In14;
format Trade_Date mmddyy10. Execution_Time time9.;
datalines;
A62 05/06/2013 11:46:58 100.1 100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1 250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1 150 B 602 1 616 1
A62 06/07/2013 14:48:28 99.854 500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1 150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1 250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1 100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028 715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1 150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1 250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1 100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699 500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1 135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1 140 B 352 3 348 1
A96 06/18/2013 16:04:55 98.257 275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1 150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1 80 B 602 1 268 1
B20 06/27/2013 09:48:19 93.6 230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1 100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1 110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375 210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05 500 B 398 2 424 1
L55 03/07/2014 13:00:45 100 500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239 75 S 176 3 716 0
;
run;
data buy;
set have(keep=cusip quantity buy_sell trade_id where=(buy_sell='B'));
token=1;
do i=1 to quantity;
output;
end;
drop i quantity;
run;
data sell;
set have(keep=cusip quantity buy_sell trade_id where=(buy_sell='S'));
token=1;
do i=1 to quantity;
output;
end;
drop i quantity;
run;
data temp;
ina=0;inb=0;
merge buy(keep=cusip trade_id rename=(trade_id=trade_id_buy) in=ina)
sell(keep=cusip trade_id rename=(trade_id=trade_id_sell) in=inb);
by cusip;
if ina and inb;
run;
proc summary data=temp nway;
class cusip trade_id_buy trade_id_sell;
output out=want(drop=_type_ rename=(_freq_=quantity));
run;
Thanks for your reply Ksharp!
It's clever, but it only works if the dataset is relatively small. The total quantity in my dataset is in the billions, which using your algorithm would create billions of observations in two datasets and then merge them back together. This would take a long time and might not work at all because of the amount of space and memory required. My hash table is able to process and match my data together in seconds - I'm really just trying to find the bug in that piece of code that's creating duplicates for me in the output. Thanks though!
Make a macro and each time process one CUSIP.
I think could handle big data.
I now eve see what you want is a dataset with (usually) one observation per purchased lot of shares. The obs will have the CUSIP, then all the buy transaction variables (B_data=trade_data, B_time=execution_time, etc). If that lot is sold it will also have an analogous set of sell transaction variables (s_date, s_price, s_quantity ...).
If all the shares from a lot is sold, it will have a quantity_available=0. If no shares from a lot is sold, then all the S_ vars will be missing, and that lot will have quantify_available=b_quantity. If a lot is partially sold, than it will have two records. The first will record the sold portion (and will have quantity_available=0), and the second will have the same B_ variables, and all missing S_ variables, and quantity_available=the unsold portion of the lot.
Even through I've written papers using hash-of-hashes for FIFO queues, I think you might be better off maintaining a single hash with both "b_" and "s_" vars for each stock lot, and a second hash with one data item per stock, with the current status (i.e. like total shares currently held, number of buys and sales to date).
In that case, each "buy" establishes a new stock LOT (i.e. a new data item in hash object LOTS). I.e. 3 buys of stock AAA generates three lots, identified by stock/date/price/quantity/source/ (the "b_" vars - b_date, b_price, b_quantity, b_source,...).
But when a subsequent "sell" for the same stock is encountered you want to match it with the oldest LOT not yet sold off - that's the FIFO part. So in addition to the buy-related vars noted above, the lot should also have the same set of variables (date/price/...) from the sell transactions (the "s_" vars).
As a rule:
You might have chosen hash-of-hashes so that you can easily find the oldest lot for a given stock. Since each stock has its own hash and corresponding hash iterator, then just using the first method of the iterator gets you to the oldest lot for that stock. If you have a single hash lots sorted by cusip/b_num (b_num is the sequence of the buy transaction), then instead of first method, you can use the setcur method, with arguments cusip=cusip, anb b_num=1.
Here is a working program:
proc datasets library=work noprint;
delete _: ;
quit;
Data have;
Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $1. Account_Number2 Source Trade_ID In14;
format trade_date yymmddn8. execution_time time8.0;
datalines;
A62 05/06/2013 11:46:58 100.1 100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1 250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1 150 B 602 1 616 1
A62 06/07/2013 14:48:28 99.854 500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1 150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1 250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1 100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028 715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1 150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1 250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1 100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699 500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1 135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1 140 B 352 3 348 1
A96 06/18/2013 16:04:55 98.257 275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1 150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1 80 B 602 1 268 1
B20 06/27/2013 09:48:19 93.6 230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1 100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1 110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375 210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05 500 B 398 2 424 1
L55 03/07/2014 13:00:45 100 500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239 75 S 176 3 716 0
run;
data _null_;
set have end=end_of_have;
length lot_cusip $8 /* The CUSIP var for use in the LOTS hash object */
shrs_held N_buys N_sells /* Vars for the stock_profile hash */
lot_sold /* Actual lot sales for this Sales transaction */
lot_avail /* Current share available in a LOT after sale */
8;
array trdata {*} trade_date execution_time quantity execution_price account_number2 source trade_id in14;
array b_data {*} b_date b_time b_quantity b_price b_acct b_source b_tid b_in14;
array s_data {*} s_date s_time s_quantity s_price s_acct s_source s_tid s_in14;
format b_date: s_date: yymmddn8. b_time s_time time8.0;
if _n_=1 then do;
declare hash stock_profile();
stock_profile.definekey('cusip');
stock_profile.definedata('cusip','shrs_held','N_buys','N_sells'); /*Number of Buys, Sells*/
stock_profile.definedone();
declare hash lots(ordered:'A');
lots.definekey('lot_cusip','b_num');
lots.definedata('lot_cusip','b_num','s_num');
lots.definedata('b_date','b_time','b_quantity','b_price','b_acct','b_source','b_tid','b_in14');
lots.definedata('s_date','s_time','s_quantity','s_price','s_acct','s_source','s_tid','s_in14');
lots.definedata('lot_sold','lot_avail');
lots.definedone();
declare hiter ilots('lots');
end;
rc=stock_profile.find(); /*Get current status of the incoming CUSIP */
if (rc^=0 or shrs_held in (.,0)) and buy_sell='S' then do;
if end_of_have=0 then delete; /*Avoid premature sales*/
else goto eoh;
end;
else if buy_sell='B' then do;
N_buys=sum(N_buys,1);
shrs_held=sum(shrs_held,quantity);
rc=stock_profile.replace();
lot_cusip=cusip;
b_num=n_buys;
lot_sold=0;
lot_avail=quantity;
do v=1 to dim(trdata); b_data{v}=trdata{v}; end;
rc=lots.add();
end;
else if buy_sell='S' then do;
N_sells=sum(N_sells,1);
total_shrs_to_sell=min(quantity,shrs_held);
/* Now step through the lots looking for shares availble to sell */
do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);
if lot_avail>0 then do;
orig_lot_avail=lot_avail;
s_num=n_sells;
do v=1 to dim(trdata); s_data{v}=trdata{v}; end;
lot_sold=min(lot_avail,total_shrs_to_sell);
lot_avail=0;
rc=lots.replace();
total_shrs_to_sell=total_shrs_to_sell-lot_sold; /*Update the sales "order" */
shrs_held=shrs_held-lot_sold; /*Update current holdings */
if lot_sold<orig_lot_avail then do; /*If the LOT was not emptied by this sale ...*/
b_num=b_num+.01; /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/
lot_avail=orig_lot_avail-lot_sold;
lot_sold=0;
call missing(of s_:);
rc=lots.add();
end;
end;
rc=ilots.next();
end;
rc=stock_profile.replace(); /*Store updated holdings*/
end;
if end_of_have ;
eoh:
rc=stock_profile.output(dataset:'stock_profile');
rc=lots.output(dataset:'want');
run;
notes:
I suspect this will be just as flexible as the hash-of-hashes approach, and probably takes somewhat less memory. Plus it allows you to easily do a bulk output of all the records for all the cusips.
Thank you so much for this solution! It works beautifully and is extremely efficient.
Shouldn't this line:
if lot_sold<orig_lot_avail then do; /*If the LOT was not emptied by this sale ...*/
be
if lot_sold<orig_lot_avail and total_shrs_to_sell > 0 then do; /*If the LOT was not emptied by this sale ...*/
Interesting and useful piece of code, I like the use of the hash tables.
No, because the transaction that results in TOTAL_SHARES_TO_SELL reaching zero, also has to be used to update the individual lot.
As an example, consider a sell order of 100, which can be satisfied from a single lot with 200 shares. Adding the condition you propose would leave that lot at 200 shares, instead of updating it to 100.
Just realized that my data had negative quantities for sales, thus I was getting unwanted sub-lots.
Originally I thought it was due to buys and sells on the same day which wasn't represented in the test data, which adding that conditioned fixed.
Once I retested your code with those scenarios, then I knew I had mangled your code or had other issues.
Thanks for responding so quickly, again nice piece of code.
Yeah. Post the output.
and the logic to calculate.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.