Solved: How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

jsmith94 · Posted 11-15-2017 11:35 AM

Hi,

I have a large dataset of trading data that looks like this:

data have;
infile datalines truncover dlm=',' dsd;
input cusip:$char9. trade_date:mmddyy10. execution_time:time8. buy_sell:$char1. (quantity execution_price trade_id) (:best32.);
format execution_time time8. trade_date mmddyy10.;
datalines;
010831BW4,09/12/2013,12:30:44,B,45,103.133,1325534595
010831BW4,09/12/2013,14:56:13,B,500,103.303,1325536968
010831BW4,09/17/2013,13:14:35,S,25,105.244,1326014939
010831BW4,09/17/2013,15:40:12,S,50,105.369,1326017794
010831BW4,09/17/2013,16:04:21,S,25,105.244,1326018222
010831BW4,09/18/2013,10:12:59,S,50,105.369,1326121154
010831BW4,09/18/2013,11:50:21,S,350,105.015,1326123005
010831BW4,09/20/2013,11:00:41,B,50,106.308,1326341521
010831BW4,09/20/2013,11:00:41,S,50,106.308,1326341522
010831BY0,09/12/2013,12:30:44,B,1075,99.592,1325534596
010831BY0,09/12/2013,16:51:56,B,350,99.777,1325547689
010831BY0,09/13/2013,11:01:07,S,20,100.25,1325642219
010831BY0,09/13/2013,11:03:05,S,330,100.44,1325642259
010831BY0,10/02/2013,11:07:37,B,50,101.117,1327522086
010831BY0,10/03/2013,11:54:05,S,50,104.218,1327632245
;
run;

I am trying to match the buy observations with the sell observations (distinguished by the Buy_Sell variable) within each CUSIP using a Last-in Last-out methodology. In other words, each time there is a sale in a CUSIP, I want the quantity most recently purchased in that CUSIP matched to the sale. If more is bought than sold (or sold than bought), then after the final buy or sell observation (whichever comes first) I want the extra quantity discarded and for the program to move on to the next CUSIP. I want my final table to look like this:

CUSIP	Quantity	b_id	bdate	btime	bprice	s_id	sdate	stime	sprice
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326014939	17-Sep-13	13:14:35	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326018222	17-Sep-13	16:04:21	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326121154	18-Sep-13	10:12:59	105.369
010831BW4	350	1325536968	12-Sep-13	14:56:13	103.303	1326123005	18-Sep-13	11:50:21	105.015
010831BW4	50	1326341521	20-Sep-13	11:00:41	106.308	1326341522	20-Sep-13	11:00:41	106.308
010831BY0	20	1325547689	12-Sep-13	16:51:56	$99.78	1325642219	13-Sep-13	11:01:07	100.25
010831BY0	330	1325547689	12-Sep-13	16:51:56	$99.78	1325642259	13-Sep-13	11:03:05	100.44
010831BY1	50	1327522086	2-Oct-13	11:07:37	101.117	1327632245	3-Oct-13	11:54:05	104.218

In order to achieve this, I am using a hash of hashes that changes for each CUSIP and is made up of hash tables and hash iterators for the buy and sell observations. Each time there is a new CUSIP, the two hash tables take all of the buy and sell observations and output them together based on the order that they were brought into the hash table and the quantities that they have. Once the hash tables run out of buy or sell observations in a CUSIP, the additional observations are discarded and the the program moves on to the next CUSIP. Here is my current code:

data want;

set have;

if _n_=1 then do;

declare hash hoh ();

hoh.definekey('cusip');

hoh.definedata('cusip','buys','sells','buy','sell');

hoh.definedone();

declare hash buys;

declare hash sells;

declare hiter buy;

declare hiter sell;

end;

if hoh.find() ne 0 then do;

length tid 8 b_id bdate btime bprice bquantity quantity s_id sdate stime sprice squantity 8;

buys=_new_ hash(ordered:'a');

buys.defineKey('tid');

buys.defineData('bdate','btime','bprice','bquantity','b_id', 'b_account', 'b_source', 'b_cusip', 'in14');

buys.defineDone();

sells=_new_ hash(ordered:'a');

sells.defineKey('tid');

sells.defineData('sdate','stime','sprice','squantity','s_id','s_account', 's_cusip');

sells.defineDone();

buy=_new_ hiter('buys');

sell=_new_ hiter('sells');

inventory=0;

end;

tid+1;

if buy_sell="B" then do;

bdate=trade_date;

btime=execution_time;

bprice=execution_price;

bquantity=quantity;

b_id=trade_id;

b_account=account_number2;

b_source=source;

b_cusip=cusip;

in14=in14;

inventory+quantity;

buys.add();

end;

else do;

if inventory=0 then delete;

if quantity>inventory then quantity=inventory;

sdate=trade_date;

stime=execution_time;

sprice=execution_price;

squantity=quantity;

s_id=trade_id;

s_account=account_number2;

s_cusip=cusip;

inventory+(-1*quantity);

sells.add();

end;

keep bdate b_account b_source b_cusip b_id bdate btime bprice quantity sdate stime sprice s_id s_account s_cusip in14;

format bdate sdate date9. bprice sprice dollar8.4 btime stime time12.;

brc=buy.last();

src=sell.first();

do while(brc=0 and src=0);

select;

when (bquantity < squantity) do;

temp=squantity-bquantity;

quantity=bquantity;

output;

squantity=temp;

brc=buy.prev();

end;

when (bquantity=squantity) do;

quantity=bquantity;

output;

brc=buy.prev();

src=sell.next();

end;

when (bquantity > squantity) do;

temp=bquantity-squantity;

quantity=squantity;

output;

bquantity=temp;

src=sell.next();

end;

otherwise;

end;

rc=hoh.replace();

run;

When I run this code, the output is all over the place. Usually the output begins correct in each CUSIP, and gets further off as it goes along. It gets especially bad when more is bought in a CUSIP than sold and some quantity needs to be discarded. The output also contains many observations that are full duplicates of previous observations, and uses observations even after their quantity has been used up and the hash table was supposed to move on to the next observation. For the dataset above, this is the output that my code gives:

CUSIP	Quantity	b_id	bdate	btime	bprice	s_id	sdate	stime	sprice
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326014939	17-Sep-13	13:14:35	105.244
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326014939	17-Sep-13	13:14:35	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326014939	17-Sep-13	13:14:35	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326018222	17-Sep-13	16:04:21	105.244
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326014939	17-Sep-13	13:14:35	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326018222	17-Sep-13	16:04:21	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326121154	18-Sep-13	10:12:59	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326014939	17-Sep-13	13:14:35	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326018222	17-Sep-13	16:04:21	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326121154	18-Sep-13	10:12:59	105.369
010831BW4	350	1325536968	12-Sep-13	14:56:13	103.303	1326123005	18-Sep-13	11:50:21	105.015
010831BW4	25	1326341521	20-Sep-13	11:00:41	106.308	1326014939	17-Sep-13	13:14:35	105.244
010831BW4	25	1326341521	20-Sep-13	11:00:41	106.308	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326018222	17-Sep-13	16:04:21	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326121154	18-Sep-13	10:12:59	105.369
010831BW4	350	1325536968	12-Sep-13	14:56:13	103.303	1326123005	18-Sep-13	11:50:21	105.015
010831BW4	25	1326341521	20-Sep-13	11:00:41	106.308	1326014939	17-Sep-13	13:14:35	105.244
010831BW4	25	1326341521	20-Sep-13	11:00:41	106.308	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326017794	17-Sep-13	15:40:12	105.369
010831BW4	25	1325536968	12-Sep-13	14:56:13	103.303	1326018222	17-Sep-13	16:04:21	105.244
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326121154	18-Sep-13	10:12:59	105.369
010831BW4	350	1325536968	12-Sep-13	14:56:13	103.303	1326123005	18-Sep-13	11:50:21	105.015
010831BW4	50	1325536968	12-Sep-13	14:56:13	103.303	1326341522	20-Sep-13	11:00:41	106.308
010831BY0	20	1325547689	12-Sep-13	16:51:56	$99.78	1325642219	13-Sep-13	11:01:07	100.25
010831BY0	20	1325547689	12-Sep-13	16:51:56	$99.78	1325642219	13-Sep-13	11:01:07	100.25
010831BY0	330	1325547689	12-Sep-13	16:51:56	$99.78	1325642259	13-Sep-13	11:03:05	100.44
010831BY0	20	1327522086	2-Oct-13	11:07:37	101.117	1325642219	13-Sep-13	11:01:07	100.25
010831BY0	30	1327522086	2-Oct-13	11:07:37	101.117	1325642259	13-Sep-13	11:03:05	100.44
010831BY0	300	1325547689	12-Sep-13	16:51:56	$99.78	1325642259	13-Sep-13	11:03:05	100.44
010831BY0	20	1327522086	2-Oct-13	11:07:37	101.117	1325642219	13-Sep-13	11:01:07	100.25
010831BY0	30	1327522086	2-Oct-13	11:07:37	101.117	1325642259	13-Sep-13	11:03:05	100.44
010831BY0	300	1325547689	12-Sep-13	16:51:56	$99.78	1325642259	13-Sep-13	11:03:05	100.44
010831BY0	50	1325547689	12-Sep-13	16:51:56	$99.78	1327632245	3-Oct-13	11:54:05	104.218

As you can see, it has many more observations than it should - some full duplicates of previous observations, some that are repeats of previous values on only the Buy or Sell side. I think the problem is somewhere in how I iterate through my hash table, but I have been unable to find the bug. Thank you for reading through and please let me know if you can figure out what I am doing wrong!

mkeintz · Posted 11-15-2017 04:03 PM

You can use the same program I provided in your Calculating FIFO Profit using Hash of Hashes. And just like in that topic, I would suggest not using the hash-of-hashes approach. Instead use a pair of hashes, one for a profile of each cusip, and the other to track stock lots.

I suspect these revisions would be sufficient for LIFO

Instead of

do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);

which finds the oldest lot of stocks, you can use

do rc=ilots.setcur(key:cusip,key:n_buys) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);

which finds the newest.

At the bottom of the loop replace

rc=ilots.next();

with

rc=ilots.prev();

Also to accommodate partial lot sales, then instead of

b_num=b_num+.01; /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/

you can use

b_num=b_num-.01; /*Create a "sublot" (=lot-.01) with remaining avaiable shrs*/

In the LIFO case, it will have the counterintuitive property of, say, LOT 3 having sublots 2.99, 2.98, etc. But that seems relatively acceptable to maintain the same program structure.

Editted additional note: Of course, if your dataset is small enough you can use the other program without any changes by pre-sorting the input dataset by descending date (it doesn't need to also be sorted by cusip).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 11-15-2017 04:03 PM

You can use the same program I provided in your Calculating FIFO Profit using Hash of Hashes. And just like in that topic, I would suggest not using the hash-of-hashes approach. Instead use a pair of hashes, one for a profile of each cusip, and the other to track stock lots.

I suspect these revisions would be sufficient for LIFO

Instead of

do rc=ilots.setcur(key:cusip,key:1) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);

which finds the oldest lot of stocks, you can use

do rc=ilots.setcur(key:cusip,key:n_buys) by 0 until(total_shrs_to_sell=0 or lot_cusip^=cusip or rc^=0);

which finds the newest.

At the bottom of the loop replace

rc=ilots.next();

with

rc=ilots.prev();

Also to accommodate partial lot sales, then instead of

b_num=b_num+.01; /*Create a "sublot" (=lot+.01) with remaining avaiable shrs*/

you can use

b_num=b_num-.01; /*Create a "sublot" (=lot-.01) with remaining avaiable shrs*/

In the LIFO case, it will have the counterintuitive property of, say, LOT 3 having sublots 2.99, 2.98, etc. But that seems relatively acceptable to maintain the same program structure.

Editted additional note: Of course, if your dataset is small enough you can use the other program without any changes by pre-sorting the input dataset by descending date (it doesn't need to also be sorted by cusip).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jsmith94 · Posted 11-15-2017 06:44 PM

Thank you so much! I had come up with solutions for FIFO (less efficient than yours and I wasn't confident about them) but I could not figure out LIFO. You've saved me a huge amount of time!

How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

Re: How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

Re: How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

Re: How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

Re: How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

Re: How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

Re: How to create a Last-in First-out (LIFO) Algorithm Using Hash Tables

Registration is open

SAS Training: Just a Click Away