Solved: Re: Calculating FIFO profit using Hash of Hashes

jsmith94 · Posted 11-13-2017 05:18 PM

I have a large dataset with trading data from which I am trying to calculate profit with a FIFO (first-in first-out) methodology in Base SAS 9.4. In other words, I want the first CUSIPs bought to be matched to the first CUSIPs sold, regardless of the date. In my final dataset, I want all of the variables from the "Buy" observation of the input dataset to show up alongside all of the variables from the "Sell" observation of the input dataset. If more is bought than sold or sold than bought in a given CUSIP, then the excess should be excluded from the final dataset. Furthermore, if more is sold than has been bought at that time, then the excess sell quantity should be disregarded. A few rows of my input dataset are here:

CUSIP	Date	Time	Price	Quantity	Buy / Sell	Account	Source	Trade ID	In14 (Dummy)
A62	5/6/2013	11:46:58 AM	100.1	100	B	602	1	135	1
A62	5/6/2013	12:23:10 PM	100.1	250	B	352	3	320	1
A62	5/6/2013	3:09:02 PM	100.1	150	B	602	1	616	1
A62	6/7/2013	2:48:28 PM	99.854	500	S	176	3	065	0
A70	5/6/2013	11:47:53 AM	100.1	150	B	602	1	287	1
A70	5/6/2013	12:23:50 PM	100.1	250	B	352	3	326	1
A70	5/6/2013	3:09:58 PM	100.1	100	B	602	1	626	1
A70	5/14/2013	11:58:52 AM	100.028	715	B	161	3	251	1
A70	5/20/2013	12:18:25 PM	100.738	1,215	S	176	3	628	0
A88	5/6/2013	11:48:44 AM	100.1	150	B	602	1	290	1
A88	5/6/2013	12:24:29 PM	100.1	250	B	352	3	345	1
A88	5/6/2013	3:10:38 PM	100.1	100	B	602	1	643	1
A88	5/20/2013	12:54:53 PM	100.699	500	S	176	3	165	0
A96	5/6/2013	11:49:21 AM	100.1	135	B	602	1	291	1
A96	5/6/2013	12:25:01 PM	100.1	140	B	352	3	348	1
A96	6/18/2013	4:04:55 PM	98.257	275	S	176	3	466	0
B20	5/6/2013	11:51:16 AM	100.1	150	B	602	1	292	1
B20	5/6/2013	3:51:57 PM	100.1	80	B	602	1	268	1
B20	6/27/2013	9:48:19 AM	93.6	230	S	176	3	181	0
B38	5/3/2013	3:50:05 PM	100.1	100	B	549	3	921	1
B38	5/6/2013	11:52:02 AM	100.1	110	B	602	1	899	1
B38	5/20/2013	4:55:36 PM	100.375	210	S	176	3	868	0
L55	3/7/2014	11:24:32 AM	100.05	500	B	398	2	424	1
L55	3/7/2014	1:00:45 PM	100	500	B	793	3	953	1
L55	3/11/2014	12:07:02 PM	100.239	75	S	176	3	716	0

In order to create the final table I described above, I have written the following code that creates a Hash of Hashes for each CUSIP that uses two other Hash Tables, one for buys, and one for sells. The code is here:

data want;

set have;

if _n_=1 then do;

declare hash hoh ();

hoh.definekey('cusip');

hoh.definedata('cusip','buys','sells','buy','sell');

hoh.definedone();

declare hash buys;

declare hash sells;

declare hiter buy;

declare hiter sell;

end;

if hoh.find() ne 0 then do;

length tid 8 b_id bdate btime bprice bquantity quantity s_id sdate stime sprice squantity 8;

buys=_new_ hash(ordered:'a');

buys.defineKey('tid');

buys.defineData('bdate','btime','bprice','bquantity','b_id', 'b_account', 'b_source', 'b_cusip', 'in14');

buys.defineDone();

sells=_new_ hash(ordered:'a');

sells.defineKey('tid');

sells.defineData('sdate','stime','sprice','squantity','s_id','s_account', 's_cusip');

sells.defineDone();

buy=_new_ hiter('buys');

sell=_new_ hiter('sells');

inventory=0;

end;

tid+1;

if buy_sell="B" then do;

bdate=trade_date;

btime=execution_time;

bprice=execution_price;

bquantity=quantity;

b_id=trade_id;

b_account=account_number2;

b_source=source;

b_cusip=cusip;

in14=in14;

inventory+quantity;

buys.add();

end;

else do;

if inventory=0 then delete;

if quantity>inventory then quantity=inventory;

sdate=trade_date;

stime=execution_time;

sprice=execution_price;

squantity=quantity;

s_id=trade_id;

s_account=account_number2;

s_cusip=cusip;

inventory+(-1*quantity);

sells.add();

end;

keep bdate b_account b_source b_cusip b_id bdate btime bprice quantity sdate stime sprice s_id s_account s_cusip in14;

format bdate sdate date9. bprice sprice dollar8.4 btime stime time12.;

brc=buy.first();

src=sell.first();

do while(brc=0 and src=0);

select;

when (bquantity < squantity) do;

temp=squantity-bquantity;

quantity=bquantity;

output;

squantity=temp;

brc=buy.next();

end;

when (bquantity=squantity) do;

quantity=bquantity;

output;

brc=buy.next();

src=sell.next();

end;

when (bquantity > squantity) do;

temp=bquantity-squantity;

quantity=squantity;

output;

bquantity=temp;

src=sell.next();

end;

otherwise;

end;

rc=hoh.replace();

run;

The problem is that I get tons of duplicate entries...my final dataset has almost 30 times the number of observations as it does once I remove duplicates. Any idea how I can stop the hash table from creating these duplicates? Once I remove the duplicates it appears to actually have correctly matched buys and sells within each CUSIP and discarded excess buy and sell quantity, but I would feel more confident about this if I wasn't getting these duplicates. Thanks so much for reading through!

mkeintz · Posted 11-15-2017 03:34 PM

I now ~~eve~~ see what you want is a dataset with (usually) one observation per purchased lot of shares. The obs will have the CUSIP, then all the buy transaction variables (B_data=trade_data, B_time=execution_time, etc). If that lot is sold it will also have an analogous set of sell transaction variables (s_date, s_price, s_quantity ...).

If all the shares from a lot is sold, it will have a quantity_available=0. If no shares from a lot is sold, then all the S_ vars will be missing, and that lot will have quantify_available=b_quantity. If a lot is partially sold, than it will have two records. The first will record the sold portion (and will have quantity_available=0), and the second will have the same B_ variables, and all missing S_ variables, and quantity_available=the unsold portion of the lot.

Even through I've written papers using hash-of-hashes for FIFO queues, I think you might be better off maintaining a single hash with both "b_" and "s_" vars for each stock lot, and a second hash with one data item per stock, with the current status (i.e. like total shares currently held, number of buys and sales to date).

In that case, each "buy" establishes a new stock LOT (i.e. a new data item in hash object LOTS). I.e. 3 buys of stock AAA generates three lots, identified by stock/date/price/quantity/source/ (the "b_" vars - b_date, b_price, b_quantity, b_source,...).

But when a subsequent "sell" for the same stock is encountered you want to match it with the oldest LOT not yet sold off - that's the FIFO part. So in addition to the buy-related vars noted above, the lot should also have the same set of variables (date/price/...) from the sell transactions (the "s_" vars).

As a rule:

When a "buy" is encountered add a new lot, with populated "b_" vars, and missing "s_" vars.
When a sell is encountered, find the oldest lot with available_shares>0.
1. If the sell is for exactly the number in available shares, simply update the lot with the "b_" vars and set available shares to 0.
2. If the sell if for more than available shares, do the same as the above, but then proceed to subsequent lots to sell of the unsatisfied portion of the original sell transaction. Update that second lot
3. But If the sell is for ~~leave~~ less than available shares then you have to create a new second "sub-lot" with the unsold portion of the lot, while updating the sold portion setting available shares to 0.

You might have chosen hash-of-hashes so that you can easily find the oldest lot for a given stock. Since each stock has its own hash and corresponding hash iterator, then just using the first method of the iterator gets you to the oldest lot for that stock. If you have a single hash lots sorted by cusip/b_num (b_num is the sequence of the buy transaction), then instead of first method, you can use the setcur method, with arguments cusip=cusip, anb b_num=1.

Here is a working program:

proc datasets library=work noprint;
  delete _: ;
quit;

Data have;
  Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $1. Account_Number2 Source Trade_ID In14;
  format trade_date yymmddn8.  execution_time time8.0;
datalines;
A62 05/06/2013 11:46:58 100.1    100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1    250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1    150 B 602 1 616 1
A62 06/07/2013 14:48:28  99.854  500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1    150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1    250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1    100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028  715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1    150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1    250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1    100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699  500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1    135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1    140 B 352 3 348 1
A96 06/18/2013 16:04:55  98.257  275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1    150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1     80 B 602 1 268 1
B20 06/27/2013 09:48:19  93.6    230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1    100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1    110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375  210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05   500 B 398 2 424 1
L55 03/07/2014 13:00:45 100      500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239   75 S 176 3 716 0
run;

data _null_;

  set have  end=end_of_have;

  length lot_cusip $8                        /* The CUSIP var for use in the LOTS hash object */
         shrs_held N_buys N_sells            /* Vars for the stock_profile hash            */
         lot_sold                            /* Actual lot sales for this Sales transaction   */
         lot_avail                           /* Current share available in a LOT after sale   */
         8;

  array trdata {*} trade_date  execution_time  quantity   execution_price  account_number2  source   trade_id  in14;
  array b_data {*} b_date      b_time          b_quantity b_price          b_acct           b_source b_tid     b_in14;
  array s_data {*} s_date      s_time          s_quantity s_price          s_acct           s_source s_tid     s_in14;
  format b_date: s_date: yymmddn8.   b_time s_time time8.0;

  if _n_=1 then do;
    declare hash stock_profile();
      stock_profile.definekey('cusip');
      stock_profile.definedata('cusip','shrs_held','N_buys','N_sells');   /*Number of Buys, Sells*/
      stock_profile.definedone();

    declare hash lots(ordered:'A');
      lots.definekey('lot_cusip','b_num'); 
      lots.definedata('lot_cusip','b_num','s_num');
      lots.definedata('b_date','b_time','b_quantity','b_price','b_acct','b_source','b_tid','b_in14');
      lots.definedata('s_date','s_time','s_quantity','s_price','s_acct','s_source','s_tid','s_in14');
      lots.definedata('lot_sold','lot_avail');       
      lots.definedone();
    declare hiter ilots('lots');
  end;

  rc=stock_profile.find();   /*Get current status of the incoming CUSIP */

  if (rc^=0 or shrs_held in (.,0)) and buy_sell='S' then do;
     if end_of_have=0 then delete;  /*Avoid premature sales*/
     else goto eoh;
  end;
  else if buy_sell='B' then do;
    N_buys=sum(N_buys,1);
    shrs_held=sum(shrs_held,quantity);
    rc=stock_profile.replace();

    lot_cusip=cusip;
    b_num=n_buys;
    lot_sold=0;
    lot_avail=quantity;
    do v=1 to dim(trdata);   b_data{v}=trdata{v};   end;
    rc=lots.add();
  end;

  else if buy_sell='S' then do;
    N_sells=sum(N_sells,1);
    total_shrs_to_sell=min(quantity,shrs_held);

    /* Now step through the lots looking for shares availble to sell */
    do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);
      if lot_avail>0 then do;
        orig_lot_avail=lot_avail;

       	s_num=n_sells;
        do v=1 to dim(trdata);   s_data{v}=trdata{v};   end;
        lot_sold=min(lot_avail,total_shrs_to_sell);
        lot_avail=0;
        rc=lots.replace();

        total_shrs_to_sell=total_shrs_to_sell-lot_sold;  /*Update the sales "order" */
        shrs_held=shrs_held-lot_sold;                    /*Update current holdings  */
	
        if lot_sold<orig_lot_avail then do;  /*If the LOT was not emptied by this sale ...*/
           b_num=b_num+.01;                  /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/
           lot_avail=orig_lot_avail-lot_sold; 
           lot_sold=0;
           call missing(of s_:);
           rc=lots.add();
        end; 

      end;

      rc=ilots.next();
    end;
    rc=stock_profile.replace();   /*Store updated holdings*/
  end;

  if end_of_have ;
eoh:
  rc=stock_profile.output(dataset:'stock_profile');
  rc=lots.output(dataset:'want');
run;

notes:

Your data has to be sorted by TRADE_DATE, but it doesn't have to be grouped by CUSIP
Note that premature sales (i.e. first transaction is an "S", or the shares_held is already at zero), that record is entirely ignored.
If, say, lot 2 for cusip AAA starts with 200 shares, of which only 50 shares are sold, then there will be two records:
1. Record with B_NUM=2.0 and lot_available=0, and lot_sold=50
2. Record with B_NUM=2.01 and lot_available=150 and lot_sold=0

I suspect this will be just as flexible as the hash-of-hashes approach, and probably takes somewhat less memory. Plus it allows you to easily do a bulk output of all the records for all the cusips.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 11-14-2017 01:51 AM

Please provide a sample SAS DATA statement (not just a table of values) of the input and equally important, the wanted output dataset. I don't understand the description of your objective.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jsmith94 · Posted 11-14-2017 10:00 AM

Sorry for the lack of information. Here is a data statement:

Data have;
Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $ Account_Number2 Source Trade_ID In14;
datalines;
A62 05/06/2013 11:46:58 100.1 100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1 250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1 150 B 602 1 616 1
A62 06/07/2013 14:48:28 99.854 500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1 150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1 250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1 100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028 715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1 150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1 250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1 100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699 500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1 135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1 140 B 352 3 348 1
A96 06/18/2013 16:04:55 98.257 275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1 150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1 80 B 602 1 268 1
B20 06/27/2013 09:48:19 93.6 230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1 100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1 110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375 210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05 500 B 398 2 424 1
L55 03/07/2014 13:00:45 100 500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239 75 S 176 3 716 0
;
run;

The objective is to match the first quantity bought with the first quantity sold in each CUSIP, disregarding any excess quantity on the buy or sell side that can't be matched. Then, the output dataset should have all of the data from the buy observations and all of the data from the sell observation matched alongside, like this:

SAS Output

b_id bdate btime bprice quantity s_id sdate stime sprice In14 b_account b_source b_cusip s_account s_cusip

135	06MAY2013	11:46:58	100.1000	100	65	07JUN2013	14:48:28	$99.8540	1	602	1	A62	176	A62
320	06MAY2013	12:23:10	100.1000	250	65	07JUN2013	14:48:28	$99.8540	1	352	3	A62	176	A62
616	06MAY2013	15:09:02	100.1000	150	65	07JUN2013	14:48:28	$99.8540	1	602	1	A62	176	A62
287	06MAY2013	11:47:53	100.1000	150	628	20MAY2013	12:18:25	100.7380	1	602	1	A70	176	A70
326	06MAY2013	12:23:50	100.1000	250	628	20MAY2013	12:18:25	100.7380	1	352	3	A70	176	A70
626	06MAY2013	15:09:58	100.1000	100	628	20MAY2013	12:18:25	100.7380	1	602	1	A70	176	A70
251	14MAY2013	11:58:52	100.0280	715	628	20MAY2013	12:18:25	100.7380	1	161	3	A70	176	A70
290	06MAY2013	11:48:44	100.1000	150	165	20MAY2013	12:54:53	100.6990	1	602	1	A88	176	A88
345	06MAY2013	12:24:29	100.1000	250	165	20MAY2013	12:54:53	100.6990	1	352	3	A88	176	A88
643	06MAY2013	15:10:38	100.1000	100	165	20MAY2013	12:54:53	100.6990	1	602	1	A88	176	A88
291	06MAY2013	11:49:21	100.1000	135	466	18JUN2013	16:04:55	$98.2570	1	602	1	A96	176	A96
348	06MAY2013	12:25:01	100.1000	140	466	18JUN2013	16:04:55	$98.2570	1	352	3	A96	176	A96
292	06MAY2013	11:51:16	100.1000	150	181	27JUN2013	9:48:19	$93.6000	1	602	1	B20	176	B20
268	06MAY2013	15:51:57	100.1000	80	181	27JUN2013	9:48:19	$93.6000	1	602	1	B20	176	B20
921	03MAY2013	15:50:05	100.1000	100	868	20MAY2013	16:55:36	100.3750	1	549	3	B38	176	B38
899	06MAY2013	11:52:02	100.1000	110	868	20MAY2013	16:55:36	100.3750	1	602	1	B38	176	B38
424	07MAR2014	11:24:32	100.0500	75	716	11MAR2014	12:07:02	100.2390	1	398	2	L55	176	L55

Let me know if there is any other information that would be useful to know. The basic logic is to create a Hash of Hashes that changes for each CUSIP, and contains references to a Buys hash table, a Sells hash table, a Buy iterator, and a Sell iterator. The hash tables take all of the buy and sell observations, and output them together based on the order that they were brought into the hash table and the quantities that they have. Once the hash tables run out of buy or sell observations in a CUSIP, the additional observations are discarded and the the program moves on to the next CUSIP. This is largely working - just with a massive number of duplicates.

Ksharp · Posted 11-15-2017 09:10 AM

How about this one ?


Data have;
Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $ Account_Number2 Source Trade_ID In14;
format Trade_Date mmddyy10. Execution_Time time9.;
datalines;
A62 05/06/2013 11:46:58 100.1 100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1 250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1 150 B 602 1 616 1
A62 06/07/2013 14:48:28 99.854 500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1 150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1 250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1 100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028 715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1 150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1 250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1 100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699 500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1 135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1 140 B 352 3 348 1
A96 06/18/2013 16:04:55 98.257 275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1 150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1 80 B 602 1 268 1
B20 06/27/2013 09:48:19 93.6 230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1 100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1 110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375 210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05 500 B 398 2 424 1
L55 03/07/2014 13:00:45 100 500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239 75 S 176 3 716 0
;
run;
data buy;
 set have(keep=cusip quantity buy_sell trade_id where=(buy_sell='B'));
 token=1;
 do i=1 to quantity;
  output;
 end;
 drop i quantity;
run;

data sell;
 set have(keep=cusip quantity buy_sell trade_id where=(buy_sell='S'));
 token=1;
 do i=1 to quantity;
  output;
 end;
 drop i quantity;
run;

data temp;
ina=0;inb=0;
 merge buy(keep=cusip trade_id rename=(trade_id=trade_id_buy) in=ina)
       sell(keep=cusip trade_id rename=(trade_id=trade_id_sell) in=inb);
 by cusip;
 if ina and inb;
run;
proc summary data=temp nway;
class cusip trade_id_buy trade_id_sell;
output out=want(drop=_type_ rename=(_freq_=quantity));
run;

jsmith94 · Posted 11-15-2017 10:36 AM

Thanks for your reply Ksharp!

It's clever, but it only works if the dataset is relatively small. The total quantity in my dataset is in the billions, which using your algorithm would create billions of observations in two datasets and then merge them back together. This would take a long time and might not work at all because of the amount of space and memory required. My hash table is able to process and match my data together in seconds - I'm really just trying to find the bug in that piece of code that's creating duplicates for me in the output. Thanks though!

Ksharp · Posted 11-16-2017 07:32 AM

Make a macro and each time process one CUSIP.

I think could handle big data.

mkeintz · Posted 11-15-2017 03:34 PM

I now ~~eve~~ see what you want is a dataset with (usually) one observation per purchased lot of shares. The obs will have the CUSIP, then all the buy transaction variables (B_data=trade_data, B_time=execution_time, etc). If that lot is sold it will also have an analogous set of sell transaction variables (s_date, s_price, s_quantity ...).

If all the shares from a lot is sold, it will have a quantity_available=0. If no shares from a lot is sold, then all the S_ vars will be missing, and that lot will have quantify_available=b_quantity. If a lot is partially sold, than it will have two records. The first will record the sold portion (and will have quantity_available=0), and the second will have the same B_ variables, and all missing S_ variables, and quantity_available=the unsold portion of the lot.

Even through I've written papers using hash-of-hashes for FIFO queues, I think you might be better off maintaining a single hash with both "b_" and "s_" vars for each stock lot, and a second hash with one data item per stock, with the current status (i.e. like total shares currently held, number of buys and sales to date).

In that case, each "buy" establishes a new stock LOT (i.e. a new data item in hash object LOTS). I.e. 3 buys of stock AAA generates three lots, identified by stock/date/price/quantity/source/ (the "b_" vars - b_date, b_price, b_quantity, b_source,...).

But when a subsequent "sell" for the same stock is encountered you want to match it with the oldest LOT not yet sold off - that's the FIFO part. So in addition to the buy-related vars noted above, the lot should also have the same set of variables (date/price/...) from the sell transactions (the "s_" vars).

As a rule:

When a "buy" is encountered add a new lot, with populated "b_" vars, and missing "s_" vars.
When a sell is encountered, find the oldest lot with available_shares>0.
1. If the sell is for exactly the number in available shares, simply update the lot with the "b_" vars and set available shares to 0.
2. If the sell if for more than available shares, do the same as the above, but then proceed to subsequent lots to sell of the unsatisfied portion of the original sell transaction. Update that second lot
3. But If the sell is for ~~leave~~ less than available shares then you have to create a new second "sub-lot" with the unsold portion of the lot, while updating the sold portion setting available shares to 0.

You might have chosen hash-of-hashes so that you can easily find the oldest lot for a given stock. Since each stock has its own hash and corresponding hash iterator, then just using the first method of the iterator gets you to the oldest lot for that stock. If you have a single hash lots sorted by cusip/b_num (b_num is the sequence of the buy transaction), then instead of first method, you can use the setcur method, with arguments cusip=cusip, anb b_num=1.

Here is a working program:

proc datasets library=work noprint;
  delete _: ;
quit;

Data have;
  Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $1. Account_Number2 Source Trade_ID In14;
  format trade_date yymmddn8.  execution_time time8.0;
datalines;
A62 05/06/2013 11:46:58 100.1    100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1    250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1    150 B 602 1 616 1
A62 06/07/2013 14:48:28  99.854  500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1    150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1    250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1    100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028  715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1    150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1    250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1    100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699  500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1    135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1    140 B 352 3 348 1
A96 06/18/2013 16:04:55  98.257  275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1    150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1     80 B 602 1 268 1
B20 06/27/2013 09:48:19  93.6    230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1    100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1    110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375  210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05   500 B 398 2 424 1
L55 03/07/2014 13:00:45 100      500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239   75 S 176 3 716 0
run;

data _null_;

  set have  end=end_of_have;

  length lot_cusip $8                        /* The CUSIP var for use in the LOTS hash object */
         shrs_held N_buys N_sells            /* Vars for the stock_profile hash            */
         lot_sold                            /* Actual lot sales for this Sales transaction   */
         lot_avail                           /* Current share available in a LOT after sale   */
         8;

  array trdata {*} trade_date  execution_time  quantity   execution_price  account_number2  source   trade_id  in14;
  array b_data {*} b_date      b_time          b_quantity b_price          b_acct           b_source b_tid     b_in14;
  array s_data {*} s_date      s_time          s_quantity s_price          s_acct           s_source s_tid     s_in14;
  format b_date: s_date: yymmddn8.   b_time s_time time8.0;

  if _n_=1 then do;
    declare hash stock_profile();
      stock_profile.definekey('cusip');
      stock_profile.definedata('cusip','shrs_held','N_buys','N_sells');   /*Number of Buys, Sells*/
      stock_profile.definedone();

    declare hash lots(ordered:'A');
      lots.definekey('lot_cusip','b_num'); 
      lots.definedata('lot_cusip','b_num','s_num');
      lots.definedata('b_date','b_time','b_quantity','b_price','b_acct','b_source','b_tid','b_in14');
      lots.definedata('s_date','s_time','s_quantity','s_price','s_acct','s_source','s_tid','s_in14');
      lots.definedata('lot_sold','lot_avail');       
      lots.definedone();
    declare hiter ilots('lots');
  end;

  rc=stock_profile.find();   /*Get current status of the incoming CUSIP */

  if (rc^=0 or shrs_held in (.,0)) and buy_sell='S' then do;
     if end_of_have=0 then delete;  /*Avoid premature sales*/
     else goto eoh;
  end;
  else if buy_sell='B' then do;
    N_buys=sum(N_buys,1);
    shrs_held=sum(shrs_held,quantity);
    rc=stock_profile.replace();

    lot_cusip=cusip;
    b_num=n_buys;
    lot_sold=0;
    lot_avail=quantity;
    do v=1 to dim(trdata);   b_data{v}=trdata{v};   end;
    rc=lots.add();
  end;

  else if buy_sell='S' then do;
    N_sells=sum(N_sells,1);
    total_shrs_to_sell=min(quantity,shrs_held);

    /* Now step through the lots looking for shares availble to sell */
    do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);
      if lot_avail>0 then do;
        orig_lot_avail=lot_avail;

       	s_num=n_sells;
        do v=1 to dim(trdata);   s_data{v}=trdata{v};   end;
        lot_sold=min(lot_avail,total_shrs_to_sell);
        lot_avail=0;
        rc=lots.replace();

        total_shrs_to_sell=total_shrs_to_sell-lot_sold;  /*Update the sales "order" */
        shrs_held=shrs_held-lot_sold;                    /*Update current holdings  */
	
        if lot_sold<orig_lot_avail then do;  /*If the LOT was not emptied by this sale ...*/
           b_num=b_num+.01;                  /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/
           lot_avail=orig_lot_avail-lot_sold; 
           lot_sold=0;
           call missing(of s_:);
           rc=lots.add();
        end; 

      end;

      rc=ilots.next();
    end;
    rc=stock_profile.replace();   /*Store updated holdings*/
  end;

  if end_of_have ;
eoh:
  rc=stock_profile.output(dataset:'stock_profile');
  rc=lots.output(dataset:'want');
run;

notes:

Your data has to be sorted by TRADE_DATE, but it doesn't have to be grouped by CUSIP
Note that premature sales (i.e. first transaction is an "S", or the shares_held is already at zero), that record is entirely ignored.
If, say, lot 2 for cusip AAA starts with 200 shares, of which only 50 shares are sold, then there will be two records:
1. Record with B_NUM=2.0 and lot_available=0, and lot_sold=50
2. Record with B_NUM=2.01 and lot_available=150 and lot_sold=0

I suspect this will be just as flexible as the hash-of-hashes approach, and probably takes somewhat less memory. Plus it allows you to easily do a bulk output of all the records for all the cusips.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jsmith94 · Posted 11-15-2017 06:41 PM

Thank you so much for this solution! It works beautifully and is extremely efficient.

J_R_Furman · Posted 03-28-2018 09:27 PM

Shouldn't this line:

if lot_sold<orig_lot_avail then do; /*If the LOT was not emptied by this sale ...*/

be

if lot_sold<orig_lot_avail and total_shrs_to_sell > 0 then do; /*If the LOT was not emptied by this sale ...*/

Interesting and useful piece of code, I like the use of the hash tables.

mkeintz · Posted 03-29-2018 01:25 AM

No, because the transaction that results in TOTAL_SHARES_TO_SELL reaching zero, also has to be used to update the individual lot.

As an example, consider a sell order of 100, which can be satisfied from a single lot with 200 shares. Adding the condition you propose would leave that lot at 200 shares, instead of updating it to 100.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

J_R_Furman · Posted 03-29-2018 08:55 AM

Just realized that my data had negative quantities for sales, thus I was getting unwanted sub-lots.

Originally I thought it was due to buys and sells on the same day which wasn't represented in the test data, which adding that conditioned fixed.

Once I retested your code with those scenarios, then I knew I had mangled your code or had other issues.

Thanks for responding so quickly, again nice piece of code.

Ksharp · Posted 11-14-2017 07:42 AM

Yeah. Post the output.

and the logic to calculate.

Classroom Training Available!