I have two Files: A and B. I want to merge/combine them in a particular way. They are in SAS format
File A
ID Type BorS ORNUM Time
A O B 20000196 9:30:00
A O B 20000197 9:30:00
A O S 20000199 9:30:00
A O B 20000201 9:30:00
A O B 20000205 9:30:01
A O S 20000209 9:30:01
B O B 20000531 9:30:00
B O S 20000643 9:30:00
B O B 20000754 9:30:01
File B is as Follows
ID Type BUYORNUM SELLORNUM TNUM Time
A T 20000196 20000199 10001 9:30:00
A T 20000197 20000199 10002 9:30:00
A T 20000205 20000209 10004 9:30:01
B T 20000754 20000643 10010 9:30:01
The merged file should look like this. The logic is that if the ORNUM matches either the BUYORNUM or the SELLORNUM then type ‘T’ is inserted into the combined file
ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
A O B 20000196 9:30:00
A O B 20000197 9:30:00
A O S 20000199 9:30:00
A T 9:30:00 20000196 20000199 10001
A T 9:30:00 20000197 20000199 10002
A O B 20000201 9:30:00
A O B 20000205 9:30:01
A O S 20000209 9:30:01
A T 9:30:01 20000205 20000209 10004
B O B 20000531 9:30:00
B O S 20000643 9:30:00
B O B 20000754 9:30:01
B T 9:30:01 20000754 20000643 10010
Thanks in Advance
Like this?
data O;
input (ID Type BorS ORNUM Time )(:$);
cards;
A O B 20000196 9:30:00
A O B 20000197 9:30:00
A O S 20000199 9:30:00
A O B 20000201 9:30:00
A O B 20000205 9:30:01
A O S 20000209 9:30:01
B O B 20000531 9:30:00
B O S 20000643 9:30:00
B O B 20000754 9:30:01
run;
data T;
input (ID Type BUYORNUM SELLORNUM TNUM Time )(:$);
cards;
A T 20000196 20000199 10001 9:30:00
A T 20000197 20000199 10002 9:30:00
A T 20000205 20000209 10004 9:30:01
B T 20000754 20000643 10010 9:30:01
run;
data WANT;
set O;
if _N_=1 then do;
dcl hash BUY(dataset:'T');
BUY.definekey('BUYORNUM');
BUY.definedata(all:'Y');
BUY.definedone();
dcl hash SELL(dataset:'T');
SELL.definekey('SELLORNUM');
SELL.definedata(all:'Y');
SELL.definedone();
if 0 then set T;
end;
call missing( BUYORNUM, SELLORNUM, TNUM );
output;
RC=BUY.find(key:ORNUM);
if RC=0 then do;
output;
RC=BUY.remove();
RC=SELL.remove(key:SELLORNUM);
end;
else do;
RC=SELL.find(key:ORNUM);
if RC=0 then do;
output;
RC=SELL.remove();
RC=BUY.remove(key:BUYORNUM);
end;
end;
drop RC;
run;
ID | Type | BorS | ORNUM | Time | BUYORNUM | SELLORNUM | TNUM |
---|---|---|---|---|---|---|---|
A | O | B | 20000196 | 9:30:00 | |||
A | T | B | 20000196 | 9:30:00 | 20000196 | 20000199 | 10001 |
A | O | B | 20000197 | 9:30:00 | |||
A | T | B | 20000197 | 9:30:00 | 20000197 | 20000199 | 10002 |
A | O | S | 20000199 | 9:30:00 | |||
A | O | B | 20000201 | 9:30:00 | |||
A | O | B | 20000205 | 9:30:01 | |||
A | T | B | 20000205 | 9:30:01 | 20000205 | 20000209 | 10004 |
A | O | S | 20000209 | 9:30:01 | |||
B | O | B | 20000531 | 9:30:00 | |||
B | O | S | 20000643 | 9:30:00 | |||
B | T | S | 20000643 | 9:30:01 | 20000754 | 20000643 | 10010 |
B | O | B | 20000754 | 9:30:01 | |||
Thanks so much. But this does not work. If you notice for ID = A at 9:30:00 SELLORNUM 20000199 is paired with two BUYORNUM 20000196 and 200197. So this is not the desired output.
@ChrisNZ wrote:Like this?
data O; input (ID Type BorS ORNUM Time )(:$); cards; A O B 20000196 9:30:00 A O B 20000197 9:30:00 A O S 20000199 9:30:00 A O B 20000201 9:30:00 A O B 20000205 9:30:01 A O S 20000209 9:30:01 B O B 20000531 9:30:00 B O S 20000643 9:30:00 B O B 20000754 9:30:01 run; data T; input (ID Type BUYORNUM SELLORNUM TNUM Time )(:$); cards; A T 20000196 20000199 10001 9:30:00 A T 20000197 20000199 10002 9:30:00 A T 20000205 20000209 10004 9:30:01 B T 20000754 20000643 10010 9:30:01 run; data WANT; set O; if _N_=1 then do; dcl hash BUY(dataset:'T'); BUY.definekey('BUYORNUM'); BUY.definedata(all:'Y'); BUY.definedone(); dcl hash SELL(dataset:'T'); SELL.definekey('SELLORNUM'); SELL.definedata(all:'Y'); SELL.definedone(); if 0 then set T; end; call missing( BUYORNUM, SELLORNUM, TNUM ); output; RC=BUY.find(key:ORNUM); if RC=0 then do; output; RC=BUY.remove(); RC=SELL.remove(key:SELLORNUM); end; else do; RC=SELL.find(key:ORNUM); if RC=0 then do; output; RC=SELL.remove(); RC=BUY.remove(key:BUYORNUM); end; end; drop RC; run;
ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
A O B 20000196 9:30:00 A T B 20000196 9:30:00 20000196 20000199 10001 A O B 20000197 9:30:00 A T B 20000197 9:30:00 20000197 20000199 10002 A O S 20000199 9:30:00 A O B 20000201 9:30:00 A O B 20000205 9:30:01 A T B 20000205 9:30:01 20000205 20000209 10004 A O S 20000209 9:30:01 B O B 20000531 9:30:00 B O S 20000643 9:30:00 B T S 20000643 9:30:01 20000754 20000643 10010 B O B 20000754 9:30:01 B T B 20000754 9:30:01 20000754 20000643 10010
I don't get how that's wrong.
Hi ChrisNZ:
I get what your saying.
But what I posted is the sequential order of a trade book; and the sequencing is important. For example if ORNUM 20000196 came in at 9:29:59 and is filled by BUYORNUM 20000199 the sequencing may be broken by your solution.
Thank you so much. I hope you can tweak the code to get the sequencing.
Randy
data O;
input (ID Type BorS ORNUM Time )(:$);
cards;
A O B 20000196 9:29:59
A O B 20000197 9:30:00
A O S 20000199 9:30:00
A O B 20000201 9:30:00
A O B 20000205 9:30:01
A O S 20000209 9:30:01
B O B 20000531 9:30:00
B O S 20000643 9:30:00
B O B 20000754 9:30:01
run;
data T;
input (ID Type BUYORNUM SELLORNUM TNUM Time )(:$);
cards;
A T 20000196 20000199 10001 9:29:59
A T 20000197 20000199 10002 9:30:00
A T 20000205 20000209 10004 9:30:01
B T 20000754 20000643 10010 9:30:01
run;
data WANT;
set O;
if _N_=1 then do;
dcl hash BUY(dataset:'T');
BUY.definekey('BUYORNUM');
BUY.definedata(all:'Y');
BUY.definedone();
dcl hash SELL(dataset:'T');
SELL.definekey('SELLORNUM');
SELL.definedata(all:'Y');
SELL.definedone();
if 0 then set T;
end;
call missing( BUYORNUM, SELLORNUM, TNUM );
output;
RC=BUY.find(key:ORNUM);
if RC=0 then do;
ORNUM='';
output;
RC=BUY.remove();
RC=SELL.remove();
end;
else do;
RC=SELL.find(key:ORNUM);
if RC=0 then do;
ORNUM='';
output;
RC=SELL.remove();
RC=BUY.remove();
end;
end;
drop RC;
run;
What sequencing do you expect for 9:29:59?
ID | Type | BorS | ORNUM | Time | BUYORNUM | SELLORNUM | TNUM |
---|---|---|---|---|---|---|---|
A | O | B | 20000196 | 9:29:59 | |||
A | T | B | 9:29:59 | 20000196 | 20000199 | 10001 | |
A | O | B | 20000197 | 9:30:00 | |||
A | T | B | 9:30:00 | 20000197 | 20000199 | 10002 | |
A | O | S | 20000199 | 9:30:00 | |||
A | O | B | 20000201 | 9:30:00 | |||
A | O | B | 20000205 | 9:30:01 | |||
A | T | B | 9:30:01 | 20000205 | 20000209 | 10004 | |
A | O | S | 20000209 | 9:30:01 | |||
B | O | B | 20000531 | 9:30:00 | |||
B | O | S | 20000643 | 9:30:00 | |||
B | T | S | 9:30:01 | 20000754 | 20000643 | 10010 | |
B | O | B | 20000754 | 9:30:01 |
If that order comes in at 9:29:59 it transacts against the order 20000199 that comes in at 9:30:00 so the transaction time is 9:30:00 for TNUM 10001 not 9:29:59. Moreover, since 20000196 cannot fill the entire order 20000199 it is also filled by ORNUM 2000197
So the trade book should look like:
ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
A O B 20000196 9:29:59
A O B 20000197 9:30:00
A O S 20000199 9:30:00
A T 9:30:00 20000196 20000199 10001
A T 9:30:00 20000197 20000199 10002
and not
A | O | B | 20000196 | 9:29:59 | |||
A | T | B | 9:29:59 | 20000196 | 20000199 | 10001 |
Once again my gratitude for all your help.
Alright, it looks like we need the heavy guns.
Not efficient, but works.
data O;
input ( ID TYPE BORS ORNUM TIME )(:$);
cards;
A O B 20000196 9:29:59
A O B 20000197 9:30:00
A O S 20000199 9:30:00
A O B 20000201 9:30:00
A O B 20000205 9:30:01
A O S 20000209 9:30:01
B O B 20000531 9:30:00
B O S 20000643 9:30:00
B O B 20000754 9:30:01
run;
data T (index=(BUYORNUM SELLORNUM READY4OUTPUT));
input ( ID TYPE BUYORNUM SELLORNUM TNUM TIME )(:$) READY4OUTPUT ;
cards;
A T 20000196 20000199 10001 9:30:00 0
A T 20000197 20000199 10002 9:30:00 0
A T 20000205 20000209 10004 9:30:01 0
B T 20000754 20000643 10010 9:30:01 0
run;
data WANT;
set O;
length BUYORNUM SELLORNUM TNUM TIME $8 READY4OUTPUT 8;
if BORS='B' then
CMD=catt('proc sql; update T set READY4OUTPUT=READY4OUTPUT+1 where BUYORNUM ="',ORNUM,'";quit;');
if BORS='S' then
CMD=catt('proc sql; update T set READY4OUTPUT=READY4OUTPUT+1 where SELLORNUM="',ORNUM,'";quit;');
RC=dosubl(CMD);
output;
ORNUM=' ';
DSID=open('T(where=(READY4OUTPUT=2))','is');
do until (RC ne 0);
call set(DSID);
RC=fetch(DSID);
if RC=0 then output;
end;
RC=close(DSID);
RC=dosubl('proc sql; delete from T where READY4OUTPUT=2;quit;');
drop READY4OUTPUT CMD RC DSID ;
run;
ID | TYPE | BORS | ORNUM | TIME | BUYORNUM | SELLORNUM | TNUM |
---|---|---|---|---|---|---|---|
A | O | B | 20000196 | 9:29:59 | |||
A | O | B | 20000197 | 9:30:00 | |||
A | O | S | 20000199 | 9:30:00 | |||
A | T | S | 9:30:00 | 20000196 | 20000199 | 10001 | |
A | T | S | 9:30:00 | 20000197 | 20000199 | 10002 | |
A | O | B | 20000201 | 9:30:00 | |||
A | O | B | 20000205 | 9:30:01 | |||
A | O | S | 20000209 | 9:30:01 | |||
A | T | S | 9:30:01 | 20000205 | 20000209 | 10004 | |
B | O | B | 20000531 | 9:30:00 | |||
B | O | S | 20000643 | 9:30:00 | |||
B | O | B | 20000754 | 9:30:01 | |||
B | T | B | 9:30:01 | 20000754 | 20000643 | 10010 |
You can also sort afterward to obtain the desired order.
I did think about that. But by what Variable?
It is really uneasy.
The difficulty is when to stop to check it from T.
data O;
input (ID Type BorS ORNUM Time )(:$);
cards;
A O B 20000196 9:30:00
A O B 20000197 9:30:00
A O S 20000199 9:30:00
A O B 20000201 9:30:00
A O B 20000205 9:30:01
A O S 20000209 9:30:01
B O B 20000531 9:30:00
B O S 20000643 9:30:00
B O B 20000754 9:30:01
run;
data T;
input (ID Type BUYORNUM SELLORNUM TNUM Time )(:$);
cards;
A T 20000196 20000199 10001 9:30:00
A T 20000197 20000199 10002 9:30:00
A T 20000205 20000209 10004 9:30:01
B T 20000754 20000643 10010 9:30:01
run;
data WANT;
if _N_=1 then do;
if 0 then set O;
if 0 then set T;
dcl hash h(dataset:'T');
h.definekey('BUYORNUM','SELLORNUM');
h.definedata(all:'Y');
h.definedone();
end;
set O;
by id;
array b{99} $ 32 _temporary_;
if first.id then call missing(of b{*},SELLORNUM);
output;
if BorS ='B' then do;n+1;b{n}=ORNUM;end;
if BorS ='S' then do;
SELLORNUM=ORNUM;
do i=1 to n;
BUYORNUM=b{i};
if h.find()=0 then do;call missing(BorS,ORNUM); output;h.remove(); n=0;end;
end;
end;
if last.id then do;
do i=1 to n;
BUYORNUM=b{i};
if h.find()=0 then do;call missing(BorS,ORNUM);output;h.remove(); n=0;end;
end;
end;
drop i n;
run;
proc print noobs;run;
Hi,
from @RandyStan original post it looks like a transaction cannot take place until there is both a buyer and seller ordernum present ,and it looks like the order numbers are cumulative. Assuming these to be true then should be able to match using the maximum order number in each row of the transaction data set.
Use the O & T datasets as starting points , but be sure that the ORNUM related variables are numeric.
Then
data WANT;
set O T ;
if TYPE="T" then ORNUM=max(BUYORNUM,SELLORNUM);
proc sort data=WANT;
by ORNUM ;
*clean up to remove the ordernum from transaction rows;
data WANT;
set WANT;
if Type="T" then ORNUM=.;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.