BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jsmith94
Calcite | Level 5

I have a large dataset with trading data from which I am trying to calculate profit with a FIFO (first-in first-out) methodology in Base SAS 9.4. In other words, I want the first CUSIPs bought to be matched to the first CUSIPs sold, regardless of the date. In my final dataset, I want all of the variables from the "Buy" observation of the input dataset to show up alongside all of the variables from the "Sell" observation of the input dataset. If more is bought than sold or sold than bought in a given CUSIP, then the excess should be excluded from the final dataset. Furthermore, if more is sold than has been bought at that time, then the excess sell quantity should be disregarded. A few rows of my input dataset are here:

 

CUSIPDateTimePriceQuantityBuy / SellAccountSourceTrade IDIn14 (Dummy)
A625/6/201311:46:58 AM100.1            100B60211351
A625/6/201312:23:10 PM100.1            250B35233201
A625/6/20133:09:02 PM100.1            150B60216161
A626/7/20132:48:28 PM99.854            500S17630650
A705/6/201311:47:53 AM100.1            150B60212871
A705/6/201312:23:50 PM100.1            250B35233261
A705/6/20133:09:58 PM100.1            100B60216261
A705/14/201311:58:52 AM100.028            715B16132511
A705/20/201312:18:25 PM100.738         1,215S17636280
A885/6/201311:48:44 AM100.1            150B60212901
A885/6/201312:24:29 PM100.1            250B35233451
A885/6/20133:10:38 PM100.1            100B60216431
A885/20/201312:54:53 PM100.699            500S17631650
A965/6/201311:49:21 AM100.1            135B60212911
A965/6/201312:25:01 PM100.1            140B35233481
A966/18/20134:04:55 PM98.257            275S17634660
B205/6/201311:51:16 AM100.1            150B60212921
B205/6/20133:51:57 PM100.1              80B60212681
B206/27/20139:48:19 AM93.6            230S17631810
B385/3/20133:50:05 PM100.1            100B54939211
B385/6/201311:52:02 AM100.1            110B60218991
B385/20/20134:55:36 PM100.375            210S17638680
L553/7/201411:24:32 AM100.05            500B39824241
L553/7/20141:00:45 PM100            500B79339531
L553/11/201412:07:02 PM100.239              75S17637160

 

 In order to create the final table I described above, I have written the following code that creates a Hash of Hashes for each CUSIP that uses two other Hash Tables, one for buys, and one for sells. The code is here:

 

data want;

      set have;

      if _n_=1 then do;

            declare hash hoh ();

            hoh.definekey('cusip');

            hoh.definedata('cusip','buys','sells','buy','sell');

            hoh.definedone();

            declare hash buys;

            declare hash sells;

            declare hiter buy;

            declare hiter sell;

      end;

            if hoh.find() ne 0 then do;

                  length tid 8 b_id bdate btime bprice bquantity quantity s_id sdate stime sprice squantity 8;

                  buys=_new_ hash(ordered:'a');

                  buys.defineKey('tid');

                  buys.defineData('bdate','btime','bprice','bquantity','b_id', 'b_account', 'b_source', 'b_cusip', 'in14');

                  buys.defineDone();

                  sells=_new_ hash(ordered:'a');

                  sells.defineKey('tid');

                  sells.defineData('sdate','stime','sprice','squantity','s_id','s_account', 's_cusip');

                  sells.defineDone();

                  buy=_new_ hiter('buys');

                  sell=_new_ hiter('sells');

                  inventory=0;

            end;

            tid+1;

            if buy_sell="B" then do;

                  bdate=trade_date;

                  btime=execution_time;

                  bprice=execution_price;

                  bquantity=quantity;

                  b_id=trade_id;

                  b_account=account_number2;

                  b_source=source;

                  b_cusip=cusip;

                  in14=in14;

                  inventory+quantity;

                  buys.add();

            end;

            else do;

                  if inventory=0 then delete;

                  if quantity>inventory then quantity=inventory;

                  sdate=trade_date;

                  stime=execution_time;

                  sprice=execution_price;

                  squantity=quantity;

                  s_id=trade_id;

                  s_account=account_number2;

                  s_cusip=cusip;

                  inventory+(-1*quantity);

                  sells.add();

            end;

      keep bdate b_account b_source b_cusip b_id bdate btime bprice quantity sdate stime sprice s_id s_account s_cusip in14;

      format bdate sdate date9. bprice sprice dollar8.4 btime stime time12.;

      brc=buy.first();

      src=sell.first();

      do while(brc=0 and src=0);

            select;

                  when (bquantity < squantity) do;

                        temp=squantity-bquantity;

                        quantity=bquantity;

                        output;

                        squantity=temp;

                        brc=buy.next();

                  end;

                  when (bquantity=squantity) do;

                        quantity=bquantity;

                        output;

                        brc=buy.next();

                        src=sell.next();

                  end;

                  when (bquantity > squantity) do;

                        temp=bquantity-squantity;

                        quantity=squantity;

                        output;

                        bquantity=temp;

                        src=sell.next();

                  end;

                  otherwise;

            end;

      end;

      rc=hoh.replace();

run;

 

 

The problem is that I get tons of duplicate entries...my final dataset has almost 30 times the number of observations as it does once I remove duplicates. Any idea how I can stop the hash table from creating these duplicates? Once I remove the duplicates it appears to actually have correctly matched buys and sells within each CUSIP and discarded excess buy and sell quantity, but I would feel more confident about this if I wasn't getting these duplicates. Thanks so much for reading through!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

I now eve see what you want is a dataset with (usually) one observation per purchased lot of shares.  The obs will have the CUSIP, then all the buy transaction variables (B_data=trade_data,  B_time=execution_time, etc).   If that lot is sold it will also have an analogous set of sell transaction variables  (s_date, s_price, s_quantity ...).  

 

If all the shares from a lot is sold, it will have a quantity_available=0.   If no shares from a lot is sold, then all the S_ vars will be missing, and that lot will have quantify_available=b_quantity.  If a lot is partially sold, than it will have two records.  The first will record the sold portion (and will have quantity_available=0), and the second will have the same B_ variables, and all missing S_ variables, and quantity_available=the unsold portion of the lot.

 

Even through I've written papers using hash-of-hashes for FIFO queues, I think you might be better off maintaining a single hash with both "b_" and "s_" vars for each stock lot, and a second hash with one data item per stock, with the current status (i.e. like total shares currently held, number of buys and sales to date).

 

In that case, each "buy" establishes a new stock LOT (i.e. a new data item in hash object LOTS).  I.e. 3 buys of stock AAA generates three lots, identified by stock/date/price/quantity/source/  (the "b_" vars - b_date, b_price, b_quantity, b_source,...).

 

But when a subsequent "sell" for the same stock is encountered you want to match it with the oldest LOT not yet sold off - that's the FIFO part.  So in addition to the buy-related vars noted above, the lot should also have the same set of variables (date/price/...) from the sell transactions (the "s_" vars).

 

 

 

As a rule:

  1. When a "buy" is encountered add a new lot, with populated "b_" vars, and missing "s_" vars.
  2. When a sell is encountered, find the oldest lot with available_shares>0.
    1. If the sell is for exactly the number in available shares, simply update the lot with the "b_" vars and set available shares to 0.
    2. If the sell if for more than available shares, do the same as the above, but then proceed to subsequent lots to sell of the unsatisfied portion of the original sell transaction.  Update that second lot
    3. But If the sell is for leave less than available shares then you have to create a new second "sub-lot" with the unsold portion of the lot, while updating the sold portion setting available shares to 0.

You might have chosen hash-of-hashes so that you can easily find the oldest lot for a given stock.  Since each stock has its own hash and corresponding hash iterator, then just using the first method of the iterator gets you to the oldest lot for that stock.  If you have a single hash lots sorted by cusip/b_num  (b_num is the sequence of the buy transaction), then instead of first method, you can use the setcur method, with arguments cusip=cusip, anb b_num=1.

 

Here is a working program:

 

 

 

 

 

proc datasets library=work noprint;
  delete _: ;
quit;

Data have;
  Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $1. Account_Number2 Source Trade_ID In14;
  format trade_date yymmddn8.  execution_time time8.0;
datalines;
A62 05/06/2013 11:46:58 100.1    100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1    250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1    150 B 602 1 616 1
A62 06/07/2013 14:48:28  99.854  500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1    150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1    250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1    100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028  715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1    150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1    250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1    100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699  500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1    135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1    140 B 352 3 348 1
A96 06/18/2013 16:04:55  98.257  275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1    150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1     80 B 602 1 268 1
B20 06/27/2013 09:48:19  93.6    230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1    100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1    110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375  210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05   500 B 398 2 424 1
L55 03/07/2014 13:00:45 100      500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239   75 S 176 3 716 0
run;

data _null_;

  set have  end=end_of_have;

  length lot_cusip $8                        /* The CUSIP var for use in the LOTS hash object */
         shrs_held N_buys N_sells            /* Vars for the stock_profile hash            */
         lot_sold                            /* Actual lot sales for this Sales transaction   */
         lot_avail                           /* Current share available in a LOT after sale   */
         8;

  array trdata {*} trade_date  execution_time  quantity   execution_price  account_number2  source   trade_id  in14;
  array b_data {*} b_date      b_time          b_quantity b_price          b_acct           b_source b_tid     b_in14;
  array s_data {*} s_date      s_time          s_quantity s_price          s_acct           s_source s_tid     s_in14;
  format b_date: s_date: yymmddn8.   b_time s_time time8.0;

  if _n_=1 then do;
    declare hash stock_profile();
      stock_profile.definekey('cusip');
      stock_profile.definedata('cusip','shrs_held','N_buys','N_sells');   /*Number of Buys, Sells*/
      stock_profile.definedone();

    declare hash lots(ordered:'A');
      lots.definekey('lot_cusip','b_num'); 
      lots.definedata('lot_cusip','b_num','s_num');
      lots.definedata('b_date','b_time','b_quantity','b_price','b_acct','b_source','b_tid','b_in14');
      lots.definedata('s_date','s_time','s_quantity','s_price','s_acct','s_source','s_tid','s_in14');
      lots.definedata('lot_sold','lot_avail');       
      lots.definedone();
    declare hiter ilots('lots');
  end;

  rc=stock_profile.find();   /*Get current status of the incoming CUSIP */

  if (rc^=0 or shrs_held in (.,0)) and buy_sell='S' then do;
     if end_of_have=0 then delete;  /*Avoid premature sales*/
     else goto eoh;
  end;
  else if buy_sell='B' then do;
    N_buys=sum(N_buys,1);
    shrs_held=sum(shrs_held,quantity);
    rc=stock_profile.replace();

    lot_cusip=cusip;
    b_num=n_buys;
    lot_sold=0;
    lot_avail=quantity;
    do v=1 to dim(trdata);   b_data{v}=trdata{v};   end;
    rc=lots.add();
  end;

  else if buy_sell='S' then do;
    N_sells=sum(N_sells,1);
    total_shrs_to_sell=min(quantity,shrs_held);

    /* Now step through the lots looking for shares availble to sell */
    do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);
      if lot_avail>0 then do;
        orig_lot_avail=lot_avail;

       	s_num=n_sells;
        do v=1 to dim(trdata);   s_data{v}=trdata{v};   end;
        lot_sold=min(lot_avail,total_shrs_to_sell);
        lot_avail=0;
        rc=lots.replace();

        total_shrs_to_sell=total_shrs_to_sell-lot_sold;  /*Update the sales "order" */
        shrs_held=shrs_held-lot_sold;                    /*Update current holdings  */
	
        if lot_sold<orig_lot_avail then do;  /*If the LOT was not emptied by this sale ...*/
           b_num=b_num+.01;                  /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/
           lot_avail=orig_lot_avail-lot_sold; 
           lot_sold=0;
           call missing(of s_:);
           rc=lots.add();
        end; 

      end;

      rc=ilots.next();
    end;
    rc=stock_profile.replace();   /*Store updated holdings*/
  end;

  if end_of_have ;
eoh:
  rc=stock_profile.output(dataset:'stock_profile');
  rc=lots.output(dataset:'want');
run;

 

 

 

notes:

  1. Your data has to be sorted by TRADE_DATE, but it doesn't have to be grouped by CUSIP
  2. Note that premature sales (i.e. first transaction is an "S", or the shares_held is already at zero), that record is entirely ignored.
  3. If, say, lot 2 for cusip AAA starts with 200 shares, of which only 50 shares are sold, then there will be two records:
    1. Record with B_NUM=2.0 and lot_available=0, and lot_sold=50
    2. Record with B_NUM=2.01 and lot_available=150 and lot_sold=0

I suspect this will be just as flexible as the hash-of-hashes approach, and probably takes somewhat less memory.  Plus it allows you to easily do a bulk output of all the records for all the cusips.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

11 REPLIES 11
mkeintz
PROC Star

Please provide a sample SAS DATA statement  (not just a table of values) of the input and equally important, the wanted output dataset.  I don't understand the description of your objective.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jsmith94
Calcite | Level 5

Sorry for the lack of information. Here is a data statement:

 

Data have;
Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $ Account_Number2 Source Trade_ID In14;
datalines;
A62 05/06/2013 11:46:58 100.1 100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1 250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1 150 B 602 1 616 1
A62 06/07/2013 14:48:28 99.854 500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1 150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1 250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1 100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028 715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1 150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1 250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1 100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699 500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1 135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1 140 B 352 3 348 1
A96 06/18/2013 16:04:55 98.257 275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1 150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1 80 B 602 1 268 1
B20 06/27/2013 09:48:19 93.6 230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1 100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1 110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375 210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05 500 B 398 2 424 1
L55 03/07/2014 13:00:45 100 500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239 75 S 176 3 716 0
;
run;

 

The objective is to match the first quantity bought with the first quantity sold in each CUSIP, disregarding any excess quantity on the buy or sell side that can't be matched. Then, the output dataset should have all of the data from the buy observations and all of the data from the sell observation matched alongside, like this:

 

SAS Output

b_id bdate btime bprice quantity s_id sdate stime sprice In14 b_account b_source b_cusip s_account s_cusip
13506MAY201311:46:58100.10001006507JUN201314:48:28$99.854016021A62176A62
32006MAY201312:23:10100.10002506507JUN201314:48:28$99.854013523A62176A62
61606MAY201315:09:02100.10001506507JUN201314:48:28$99.854016021A62176A62
28706MAY201311:47:53100.100015062820MAY201312:18:25100.738016021A70176A70
32606MAY201312:23:50100.100025062820MAY201312:18:25100.738013523A70176A70
62606MAY201315:09:58100.100010062820MAY201312:18:25100.738016021A70176A70
25114MAY201311:58:52100.028071562820MAY201312:18:25100.738011613A70176A70
29006MAY201311:48:44100.100015016520MAY201312:54:53100.699016021A88176A88
34506MAY201312:24:29100.100025016520MAY201312:54:53100.699013523A88176A88
64306MAY201315:10:38100.100010016520MAY201312:54:53100.699016021A88176A88
29106MAY201311:49:21100.100013546618JUN201316:04:55$98.257016021A96176A96
34806MAY201312:25:01100.100014046618JUN201316:04:55$98.257013523A96176A96
29206MAY201311:51:16100.100015018127JUN20139:48:19$93.600016021B20176B20
26806MAY201315:51:57100.10008018127JUN20139:48:19$93.600016021B20176B20
92103MAY201315:50:05100.100010086820MAY201316:55:36100.375015493B38176B38
89906MAY201311:52:02100.100011086820MAY201316:55:36100.375016021B38176B38
42407MAR201411:24:32100.05007571611MAR201412:07:02100.239013982L55176L55

 

Let me know if there is any other information that would be useful to know. The basic logic is to create a Hash of Hashes that changes for each CUSIP, and contains references to a Buys hash table, a Sells hash table, a Buy iterator, and a Sell iterator. The hash tables take all of the buy and sell observations, and output them together based on the order that they were brought into the hash table and the quantities that they have. Once the hash tables run out of buy or sell observations in a CUSIP, the additional observations are discarded and the the program moves on to the next CUSIP. This is largely working - just with a massive number of duplicates.

 

Ksharp
Super User

How about this one ?

 


Data have;
Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $ Account_Number2 Source Trade_ID In14;
format Trade_Date mmddyy10. Execution_Time time9.;
datalines;
A62 05/06/2013 11:46:58 100.1 100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1 250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1 150 B 602 1 616 1
A62 06/07/2013 14:48:28 99.854 500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1 150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1 250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1 100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028 715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1 150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1 250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1 100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699 500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1 135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1 140 B 352 3 348 1
A96 06/18/2013 16:04:55 98.257 275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1 150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1 80 B 602 1 268 1
B20 06/27/2013 09:48:19 93.6 230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1 100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1 110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375 210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05 500 B 398 2 424 1
L55 03/07/2014 13:00:45 100 500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239 75 S 176 3 716 0
;
run;
data buy;
 set have(keep=cusip quantity buy_sell trade_id where=(buy_sell='B'));
 token=1;
 do i=1 to quantity;
  output;
 end;
 drop i quantity;
run;

data sell;
 set have(keep=cusip quantity buy_sell trade_id where=(buy_sell='S'));
 token=1;
 do i=1 to quantity;
  output;
 end;
 drop i quantity;
run;

data temp;
ina=0;inb=0;
 merge buy(keep=cusip trade_id rename=(trade_id=trade_id_buy) in=ina)
       sell(keep=cusip trade_id rename=(trade_id=trade_id_sell) in=inb);
 by cusip;
 if ina and inb;
run;
proc summary data=temp nway;
class cusip trade_id_buy trade_id_sell;
output out=want(drop=_type_ rename=(_freq_=quantity));
run;
jsmith94
Calcite | Level 5

Thanks for your reply Ksharp!

 

It's clever, but it only works if the dataset is relatively small. The total quantity in my dataset is in the billions, which using your algorithm would create billions of observations in two datasets and then merge them back together. This would take a long time and might not work at all because of the amount of space and memory required. My hash table is able to process and match my data together in seconds - I'm really just trying to find the bug in that piece of code that's creating duplicates for me in the output. Thanks though!

Ksharp
Super User

Make a macro and each time process one CUSIP. 

I think could handle big data.

mkeintz
PROC Star

I now eve see what you want is a dataset with (usually) one observation per purchased lot of shares.  The obs will have the CUSIP, then all the buy transaction variables (B_data=trade_data,  B_time=execution_time, etc).   If that lot is sold it will also have an analogous set of sell transaction variables  (s_date, s_price, s_quantity ...).  

 

If all the shares from a lot is sold, it will have a quantity_available=0.   If no shares from a lot is sold, then all the S_ vars will be missing, and that lot will have quantify_available=b_quantity.  If a lot is partially sold, than it will have two records.  The first will record the sold portion (and will have quantity_available=0), and the second will have the same B_ variables, and all missing S_ variables, and quantity_available=the unsold portion of the lot.

 

Even through I've written papers using hash-of-hashes for FIFO queues, I think you might be better off maintaining a single hash with both "b_" and "s_" vars for each stock lot, and a second hash with one data item per stock, with the current status (i.e. like total shares currently held, number of buys and sales to date).

 

In that case, each "buy" establishes a new stock LOT (i.e. a new data item in hash object LOTS).  I.e. 3 buys of stock AAA generates three lots, identified by stock/date/price/quantity/source/  (the "b_" vars - b_date, b_price, b_quantity, b_source,...).

 

But when a subsequent "sell" for the same stock is encountered you want to match it with the oldest LOT not yet sold off - that's the FIFO part.  So in addition to the buy-related vars noted above, the lot should also have the same set of variables (date/price/...) from the sell transactions (the "s_" vars).

 

 

 

As a rule:

  1. When a "buy" is encountered add a new lot, with populated "b_" vars, and missing "s_" vars.
  2. When a sell is encountered, find the oldest lot with available_shares>0.
    1. If the sell is for exactly the number in available shares, simply update the lot with the "b_" vars and set available shares to 0.
    2. If the sell if for more than available shares, do the same as the above, but then proceed to subsequent lots to sell of the unsatisfied portion of the original sell transaction.  Update that second lot
    3. But If the sell is for leave less than available shares then you have to create a new second "sub-lot" with the unsold portion of the lot, while updating the sold portion setting available shares to 0.

You might have chosen hash-of-hashes so that you can easily find the oldest lot for a given stock.  Since each stock has its own hash and corresponding hash iterator, then just using the first method of the iterator gets you to the oldest lot for that stock.  If you have a single hash lots sorted by cusip/b_num  (b_num is the sequence of the buy transaction), then instead of first method, you can use the setcur method, with arguments cusip=cusip, anb b_num=1.

 

Here is a working program:

 

 

 

 

 

proc datasets library=work noprint;
  delete _: ;
quit;

Data have;
  Input Cusip $ Trade_Date mmddyy10. Execution_Time time9. Execution_Price Quantity Buy_Sell $1. Account_Number2 Source Trade_ID In14;
  format trade_date yymmddn8.  execution_time time8.0;
datalines;
A62 05/06/2013 11:46:58 100.1    100 B 602 1 135 1
A62 05/06/2013 12:23:10 100.1    250 B 352 3 320 1
A62 05/06/2013 15:09:02 100.1    150 B 602 1 616 1
A62 06/07/2013 14:48:28  99.854  500 S 176 3 065 0
A70 05/06/2013 11:47:53 100.1    150 B 602 1 287 1
A70 05/06/2013 12:23:50 100.1    250 B 352 3 326 1
A70 05/06/2013 15:09:58 100.1    100 B 602 1 626 1
A70 05/14/2013 11:58:52 100.028  715 B 161 3 251 1
A70 05/20/2013 12:18:25 100.738 1215 S 176 3 628 0
A88 05/06/2013 11:48:44 100.1    150 B 602 1 290 1
A88 05/06/2013 12:24:29 100.1    250 B 352 3 345 1
A88 05/06/2013 15:10:38 100.1    100 B 602 1 643 1
A88 05/20/2013 12:54:53 100.699  500 S 176 3 165 0
A96 05/06/2013 11:49:21 100.1    135 B 602 1 291 1
A96 05/06/2013 12:25:01 100.1    140 B 352 3 348 1
A96 06/18/2013 16:04:55  98.257  275 S 176 3 466 0
B20 05/06/2013 11:51:16 100.1    150 B 602 1 292 1
B20 05/06/2013 15:51:57 100.1     80 B 602 1 268 1
B20 06/27/2013 09:48:19  93.6    230 S 176 3 181 0
B38 05/03/2013 15:50:05 100.1    100 B 549 3 921 1
B38 05/06/2013 11:52:02 100.1    110 B 602 1 899 1
B38 05/20/2013 16:55:36 100.375  210 S 176 3 868 0
L55 03/07/2014 11:24:32 100.05   500 B 398 2 424 1
L55 03/07/2014 13:00:45 100      500 B 793 3 953 1
L55 03/11/2014 12:07:02 100.239   75 S 176 3 716 0
run;

data _null_;

  set have  end=end_of_have;

  length lot_cusip $8                        /* The CUSIP var for use in the LOTS hash object */
         shrs_held N_buys N_sells            /* Vars for the stock_profile hash            */
         lot_sold                            /* Actual lot sales for this Sales transaction   */
         lot_avail                           /* Current share available in a LOT after sale   */
         8;

  array trdata {*} trade_date  execution_time  quantity   execution_price  account_number2  source   trade_id  in14;
  array b_data {*} b_date      b_time          b_quantity b_price          b_acct           b_source b_tid     b_in14;
  array s_data {*} s_date      s_time          s_quantity s_price          s_acct           s_source s_tid     s_in14;
  format b_date: s_date: yymmddn8.   b_time s_time time8.0;

  if _n_=1 then do;
    declare hash stock_profile();
      stock_profile.definekey('cusip');
      stock_profile.definedata('cusip','shrs_held','N_buys','N_sells');   /*Number of Buys, Sells*/
      stock_profile.definedone();

    declare hash lots(ordered:'A');
      lots.definekey('lot_cusip','b_num'); 
      lots.definedata('lot_cusip','b_num','s_num');
      lots.definedata('b_date','b_time','b_quantity','b_price','b_acct','b_source','b_tid','b_in14');
      lots.definedata('s_date','s_time','s_quantity','s_price','s_acct','s_source','s_tid','s_in14');
      lots.definedata('lot_sold','lot_avail');       
      lots.definedone();
    declare hiter ilots('lots');
  end;

  rc=stock_profile.find();   /*Get current status of the incoming CUSIP */

  if (rc^=0 or shrs_held in (.,0)) and buy_sell='S' then do;
     if end_of_have=0 then delete;  /*Avoid premature sales*/
     else goto eoh;
  end;
  else if buy_sell='B' then do;
    N_buys=sum(N_buys,1);
    shrs_held=sum(shrs_held,quantity);
    rc=stock_profile.replace();

    lot_cusip=cusip;
    b_num=n_buys;
    lot_sold=0;
    lot_avail=quantity;
    do v=1 to dim(trdata);   b_data{v}=trdata{v};   end;
    rc=lots.add();
  end;

  else if buy_sell='S' then do;
    N_sells=sum(N_sells,1);
    total_shrs_to_sell=min(quantity,shrs_held);

    /* Now step through the lots looking for shares availble to sell */
    do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);
      if lot_avail>0 then do;
        orig_lot_avail=lot_avail;

       	s_num=n_sells;
        do v=1 to dim(trdata);   s_data{v}=trdata{v};   end;
        lot_sold=min(lot_avail,total_shrs_to_sell);
        lot_avail=0;
        rc=lots.replace();

        total_shrs_to_sell=total_shrs_to_sell-lot_sold;  /*Update the sales "order" */
        shrs_held=shrs_held-lot_sold;                    /*Update current holdings  */
	
        if lot_sold<orig_lot_avail then do;  /*If the LOT was not emptied by this sale ...*/
           b_num=b_num+.01;                  /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/
           lot_avail=orig_lot_avail-lot_sold; 
           lot_sold=0;
           call missing(of s_:);
           rc=lots.add();
        end; 

      end;

      rc=ilots.next();
    end;
    rc=stock_profile.replace();   /*Store updated holdings*/
  end;

  if end_of_have ;
eoh:
  rc=stock_profile.output(dataset:'stock_profile');
  rc=lots.output(dataset:'want');
run;

 

 

 

notes:

  1. Your data has to be sorted by TRADE_DATE, but it doesn't have to be grouped by CUSIP
  2. Note that premature sales (i.e. first transaction is an "S", or the shares_held is already at zero), that record is entirely ignored.
  3. If, say, lot 2 for cusip AAA starts with 200 shares, of which only 50 shares are sold, then there will be two records:
    1. Record with B_NUM=2.0 and lot_available=0, and lot_sold=50
    2. Record with B_NUM=2.01 and lot_available=150 and lot_sold=0

I suspect this will be just as flexible as the hash-of-hashes approach, and probably takes somewhat less memory.  Plus it allows you to easily do a bulk output of all the records for all the cusips.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jsmith94
Calcite | Level 5

Thank you so much for this solution! It works beautifully and is extremely efficient. 

J_R_Furman
Fluorite | Level 6

Shouldn't this line:

if lot_sold<orig_lot_avail then do;  /*If the LOT was not emptied by this sale ...*/

 

be 

 

if lot_sold<orig_lot_avail  and total_shrs_to_sell > 0 then do;  /*If the LOT was not emptied by this sale ...*/

 

Interesting and useful piece of code, I like the use of the hash tables.

mkeintz
PROC Star

No, because the transaction that results in TOTAL_SHARES_TO_SELL reaching zero, also has to be used to update the individual lot.

 

As an example, consider a sell order of 100, which can be satisfied from a single lot with 200 shares. Adding the condition you propose would leave that lot at 200 shares, instead of updating it to 100.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
J_R_Furman
Fluorite | Level 6

Just realized that my data had negative quantities for sales, thus I was getting unwanted sub-lots.

 

Originally I thought it was due to buys and sells on the same day which wasn't represented in the test data, which adding that conditioned fixed.

 

Once I retested your code with those scenarios, then I knew I had mangled your code or had other issues.

 

Thanks for responding so quickly, again nice piece of code. 

Ksharp
Super User

Yeah. Post the output.

and the logic to calculate.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2029 views
  • 3 likes
  • 4 in conversation