BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jsmith94
Calcite | Level 5

Hi,

 

I have a large dataset of trading data that looks like this:

 

data have;
infile datalines truncover dlm=',' dsd;
input cusip:$char9. trade_date:mmddyy10. execution_time:time8. buy_sell:$char1. (quantity execution_price trade_id) (:best32.);
format execution_time time8. trade_date mmddyy10.;
datalines;
010831BW4,09/12/2013,12:30:44,B,45,103.133,1325534595
010831BW4,09/12/2013,14:56:13,B,500,103.303,1325536968
010831BW4,09/17/2013,13:14:35,S,25,105.244,1326014939
010831BW4,09/17/2013,15:40:12,S,50,105.369,1326017794
010831BW4,09/17/2013,16:04:21,S,25,105.244,1326018222
010831BW4,09/18/2013,10:12:59,S,50,105.369,1326121154
010831BW4,09/18/2013,11:50:21,S,350,105.015,1326123005
010831BW4,09/20/2013,11:00:41,B,50,106.308,1326341521
010831BW4,09/20/2013,11:00:41,S,50,106.308,1326341522
010831BY0,09/12/2013,12:30:44,B,1075,99.592,1325534596
010831BY0,09/12/2013,16:51:56,B,350,99.777,1325547689
010831BY0,09/13/2013,11:01:07,S,20,100.25,1325642219
010831BY0,09/13/2013,11:03:05,S,330,100.44,1325642259
010831BY0,10/02/2013,11:07:37,B,50,101.117,1327522086
010831BY0,10/03/2013,11:54:05,S,50,104.218,1327632245
;
run;

 

I am trying to match the buy observations with the sell observations (distinguished by the Buy_Sell variable) within each CUSIP using a Last-in Last-out methodology. In other words, each time there is a sale in a CUSIP, I want the quantity most recently purchased in that CUSIP matched to the sale. If more is bought than sold (or sold than bought), then after the final buy or sell observation (whichever comes first) I want the extra quantity discarded and for the program to move on to the next CUSIP. I want my final table to look like this:

 

CUSIPQuantityb_idbdatebtimebprices_idsdatestimesprice
010831BW425132553696812-Sep-1314:56:13103.303132601493917-Sep-1313:14:35105.244
010831BW450132553696812-Sep-1314:56:13103.303132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601822217-Sep-1316:04:21105.244
010831BW450132553696812-Sep-1314:56:13103.303132612115418-Sep-1310:12:59105.369
010831BW4350132553696812-Sep-1314:56:13103.303132612300518-Sep-1311:50:21105.015
010831BW450132634152120-Sep-1311:00:41106.308132634152220-Sep-1311:00:41106.308
010831BY020132554768912-Sep-1316:51:56$99.78132564221913-Sep-1311:01:07100.25
010831BY0330132554768912-Sep-1316:51:56$99.78132564225913-Sep-1311:03:05100.44
010831BY15013275220862-Oct-1311:07:37101.11713276322453-Oct-1311:54:05104.218

 

 

In order to achieve this, I am using a hash of hashes that changes for each CUSIP and is made up of hash tables and hash iterators for the buy and sell observations. Each time there is a new CUSIP, the two hash tables take all of the buy and sell observations and output them together based on the order that they were brought into the hash table and the quantities that they have. Once the hash tables run out of buy or sell observations in a CUSIP, the additional observations are discarded and the the program moves on to the next CUSIP. Here is my current code:

 

data want;

      set have;

      if _n_=1 then do;

            declare hash hoh ();

            hoh.definekey('cusip');

            hoh.definedata('cusip','buys','sells','buy','sell');

            hoh.definedone();

            declare hash buys;

            declare hash sells;

            declare hiter buy;

            declare hiter sell;

      end;

            if hoh.find() ne 0 then do;

                  length tid 8 b_id bdate btime bprice bquantity quantity s_id sdate stime sprice squantity 8;

                  buys=_new_ hash(ordered:'a');

                  buys.defineKey('tid');

                  buys.defineData('bdate','btime','bprice','bquantity','b_id', 'b_account', 'b_source', 'b_cusip', 'in14');

                  buys.defineDone();

                  sells=_new_ hash(ordered:'a');

                  sells.defineKey('tid');

                  sells.defineData('sdate','stime','sprice','squantity','s_id','s_account', 's_cusip');

                  sells.defineDone();

                  buy=_new_ hiter('buys');

                  sell=_new_ hiter('sells');

                  inventory=0;

            end;

            tid+1;

            if buy_sell="B" then do;

                  bdate=trade_date;

                  btime=execution_time;

                  bprice=execution_price;

                  bquantity=quantity;

                  b_id=trade_id;

                  b_account=account_number2;

                  b_source=source;

                  b_cusip=cusip;

                  in14=in14;

                  inventory+quantity;

                  buys.add();

            end;

            else do;

                  if inventory=0 then delete;

                  if quantity>inventory then quantity=inventory;

                  sdate=trade_date;

                  stime=execution_time;

                  sprice=execution_price;

                  squantity=quantity;

                  s_id=trade_id;

                  s_account=account_number2;

                  s_cusip=cusip;

                  inventory+(-1*quantity);

                  sells.add();

            end;

      keep bdate b_account b_source b_cusip b_id bdate btime bprice quantity sdate stime sprice s_id s_account s_cusip in14;

      format bdate sdate date9. bprice sprice dollar8.4 btime stime time12.;

      brc=buy.last();

      src=sell.first();

      do while(brc=0 and src=0);

            select;

                  when (bquantity < squantity) do;

                        temp=squantity-bquantity;

                        quantity=bquantity;

                        output;

                        squantity=temp;

                        brc=buy.prev();

                  end;

                  when (bquantity=squantity) do;

                        quantity=bquantity;

                        output;

                        brc=buy.prev();

                        src=sell.next();

                  end;

                  when (bquantity > squantity) do;

                        temp=bquantity-squantity;

                        quantity=squantity;

                        output;

                        bquantity=temp;

                        src=sell.next();

                  end;

                  otherwise;

            end;

      end;

      rc=hoh.replace();

run;

 

When I run this code, the output is all over the place. Usually the output begins correct in each CUSIP, and gets further off as it goes along. It gets especially bad when more is bought in a CUSIP than sold and some quantity needs to be discarded. The output also contains many observations that are full duplicates of previous observations, and uses observations even after their quantity has been used up and the hash table was supposed to move on to the next observation. For the dataset above, this is the output that my code gives:

 

CUSIPQuantityb_idbdatebtimebprices_idsdatestimesprice
010831BW425132553696812-Sep-1314:56:13103.303132601493917-Sep-1313:14:35105.244
010831BW425132553696812-Sep-1314:56:13103.303132601493917-Sep-1313:14:35105.244
010831BW450132553696812-Sep-1314:56:13103.303132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601493917-Sep-1313:14:35105.244
010831BW450132553696812-Sep-1314:56:13103.303132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601822217-Sep-1316:04:21105.244
010831BW425132553696812-Sep-1314:56:13103.303132601493917-Sep-1313:14:35105.244
010831BW450132553696812-Sep-1314:56:13103.303132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601822217-Sep-1316:04:21105.244
010831BW450132553696812-Sep-1314:56:13103.303132612115418-Sep-1310:12:59105.369
010831BW425132553696812-Sep-1314:56:13103.303132601493917-Sep-1313:14:35105.244
010831BW450132553696812-Sep-1314:56:13103.303132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601822217-Sep-1316:04:21105.244
010831BW450132553696812-Sep-1314:56:13103.303132612115418-Sep-1310:12:59105.369
010831BW4350132553696812-Sep-1314:56:13103.303132612300518-Sep-1311:50:21105.015
010831BW425132634152120-Sep-1311:00:41106.308132601493917-Sep-1313:14:35105.244
010831BW425132634152120-Sep-1311:00:41106.308132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601822217-Sep-1316:04:21105.244
010831BW450132553696812-Sep-1314:56:13103.303132612115418-Sep-1310:12:59105.369
010831BW4350132553696812-Sep-1314:56:13103.303132612300518-Sep-1311:50:21105.015
010831BW425132634152120-Sep-1311:00:41106.308132601493917-Sep-1313:14:35105.244
010831BW425132634152120-Sep-1311:00:41106.308132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601779417-Sep-1315:40:12105.369
010831BW425132553696812-Sep-1314:56:13103.303132601822217-Sep-1316:04:21105.244
010831BW450132553696812-Sep-1314:56:13103.303132612115418-Sep-1310:12:59105.369
010831BW4350132553696812-Sep-1314:56:13103.303132612300518-Sep-1311:50:21105.015
010831BW450132553696812-Sep-1314:56:13103.303132634152220-Sep-1311:00:41106.308
010831BY020132554768912-Sep-1316:51:56$99.78132564221913-Sep-1311:01:07100.25
010831BY020132554768912-Sep-1316:51:56$99.78132564221913-Sep-1311:01:07100.25
010831BY0330132554768912-Sep-1316:51:56$99.78132564225913-Sep-1311:03:05100.44
010831BY02013275220862-Oct-1311:07:37101.117132564221913-Sep-1311:01:07100.25
010831BY03013275220862-Oct-1311:07:37101.117132564225913-Sep-1311:03:05100.44
010831BY0300132554768912-Sep-1316:51:56$99.78132564225913-Sep-1311:03:05100.44
010831BY02013275220862-Oct-1311:07:37101.117132564221913-Sep-1311:01:07100.25
010831BY03013275220862-Oct-1311:07:37101.117132564225913-Sep-1311:03:05100.44
010831BY0300132554768912-Sep-1316:51:56$99.78132564225913-Sep-1311:03:05100.44
010831BY050132554768912-Sep-1316:51:56$99.7813276322453-Oct-1311:54:05104.218

 

As you can see, it has many more observations than it should - some full duplicates of previous observations, some that are repeats of previous values on only the Buy or Sell side. I think the problem is somewhere in how I iterate through my hash table, but I have been unable to find the bug. Thank you for reading through and please let me know if you can figure out what I am doing wrong!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You can use the same program I provided in your Calculating FIFO Profit using Hash of Hashes.  And just like in that topic, I would suggest not using the hash-of-hashes approach.  Instead use a pair of hashes, one for a profile of each cusip, and the other to track stock lots.

 

I suspect these revisions would be sufficient for LIFO

 

Instead of

  do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);

which finds the oldest lot of stocks, you can use

  do rc=ilots.setcur(key:cusip,key:n_buys) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);

which finds the newest.

 

At the bottom of the loop replace

  rc=ilots.next();

with

  rc=ilots.prev();

 

Also to accommodate partial lot sales, then instead of

  b_num=b_num+.01; /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/

you can use

   b_num=b_num-.01; /*Create a "sublot" (=lot-.01) with remaining avaiable shrs*/

 

In the LIFO case, it will have the counterintuitive property of, say, LOT 3 having sublots 2.99, 2.98, etc.  But that seems relatively acceptable to maintain the same program structure.

 

 

 

Editted additional note:  Of course, if your dataset is small enough you can use the other program without any changes by pre-sorting the input dataset by descending date (it doesn't need to also be sorted by cusip).

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

You can use the same program I provided in your Calculating FIFO Profit using Hash of Hashes.  And just like in that topic, I would suggest not using the hash-of-hashes approach.  Instead use a pair of hashes, one for a profile of each cusip, and the other to track stock lots.

 

I suspect these revisions would be sufficient for LIFO

 

Instead of

  do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);

which finds the oldest lot of stocks, you can use

  do rc=ilots.setcur(key:cusip,key:n_buys) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);

which finds the newest.

 

At the bottom of the loop replace

  rc=ilots.next();

with

  rc=ilots.prev();

 

Also to accommodate partial lot sales, then instead of

  b_num=b_num+.01; /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/

you can use

   b_num=b_num-.01; /*Create a "sublot" (=lot-.01) with remaining avaiable shrs*/

 

In the LIFO case, it will have the counterintuitive property of, say, LOT 3 having sublots 2.99, 2.98, etc.  But that seems relatively acceptable to maintain the same program structure.

 

 

 

Editted additional note:  Of course, if your dataset is small enough you can use the other program without any changes by pre-sorting the input dataset by descending date (it doesn't need to also be sorted by cusip).

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jsmith94
Calcite | Level 5

Thank you so much! I had come up with solutions for FIFO (less efficient than yours and I wasn't confident about them) but I could not figure out LIFO. You've saved me a huge amount of time!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1808 views
  • 0 likes
  • 2 in conversation