DATA Step, Macro, Functions and more

Combining Files

Reply
Contributor
Posts: 27

Combining Files

I have two Files: A and B.  I want to merge/combine  them in a particular way.  They are in SAS format

File A

ID        Type      BorS        ORNUM                 Time

A           O           B              20000196             9:30:00

A            O           B              20000197             9:30:00

A            O           S              20000199              9:30:00

A            O           B              20000201              9:30:00

A            O           B               20000205             9:30:01

A            O           S              20000209             9:30:01

B             O           B               20000531            9:30:00

B             O            S              20000643            9:30:00

B              O           B               20000754           9:30:01

 

File B is as Follows

 

ID         Type         BUYORNUM       SELLORNUM          TNUM         Time

A           T                20000196                 20000199         10001         9:30:00

A           T                 20000197                 20000199        10002         9:30:00

A          T                   20000205                20000209         10004        9:30:01

B          T                  20000754                  20000643         10010          9:30:01

 

 

The merged file should look like this.  The logic is that if the ORNUM matches either the BUYORNUM or the SELLORNUM then type ‘T’ is inserted into the combined file

ID        Type      BorS        ORNUM                 Time                  BUYORNUM            SELLORNUM             TNUM        

A           O           B              20000196             9:30:00

A            O           B              20000197             9:30:00

A            O           S              20000199              9:30:00

A           T                                                              9:30:00              20000196                 20000199         10001        

A           T                                                              9:30:00               20000197                 20000199        10002       

A            O           B              20000201              9:30:00

A            O           B               20000205             9:30:01

A            O           S              20000209             9:30:01

A           T                                                             9:30:01                  20000205              20000209         10004      

B             O           B               20000531            9:30:00

B             O            S              20000643            9:30:00

B              O           B               20000754           9:30:01

B               T                                                          9:30:01                    20000754                  20000643           10010     

 

Thanks in Advance    

Super User
Posts: 2,518

Re: Combining Files

[ Edited ]
Posted in reply to RandyStan

Like this?

data O;
input (ID        Type      BorS        ORNUM                 Time )(:$);
cards;
A           O           B              20000196             9:30:00
A            O           B              20000197             9:30:00
A            O           S              20000199              9:30:00
A            O           B              20000201              9:30:00
A            O           B               20000205             9:30:01
A            O           S              20000209             9:30:01
B             O           B               20000531            9:30:00
B             O            S              20000643            9:30:00
B              O           B               20000754           9:30:01
run;

data T;
input (ID         Type         BUYORNUM       SELLORNUM          TNUM         Time  )(:$);
cards;
A           T                20000196                 20000199         10001         9:30:00
A           T                 20000197                 20000199        10002         9:30:00
A          T                   20000205                20000209         10004        9:30:01
B          T                  20000754                  20000643         10010          9:30:01
run;

data WANT;
  set O;
  if _N_=1 then do;  
    dcl hash BUY(dataset:'T');
    BUY.definekey('BUYORNUM');
    BUY.definedata(all:'Y');
    BUY.definedone();
    dcl hash SELL(dataset:'T');
    SELL.definekey('SELLORNUM');
    SELL.definedata(all:'Y');
    SELL.definedone();
    if 0 then set T;
  end;    
  call missing( BUYORNUM, SELLORNUM, TNUM );
  output;
  RC=BUY.find(key:ORNUM);
  if RC=0 then do;
    output;
    RC=BUY.remove();
    RC=SELL.remove(key:SELLORNUM);
  end;
  else do;
    RC=SELL.find(key:ORNUM);
    if RC=0 then do;
      output;
      RC=SELL.remove();
      RC=BUY.remove(key:BUYORNUM);
    end;
  end;
  drop RC; 
run;

 

ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
A O B 20000196 9:30:00      
A T B 20000196 9:30:00 20000196 20000199 10001
A O B 20000197 9:30:00      
A T B 20000197 9:30:00 20000197 20000199 10002
A O S 20000199 9:30:00      
A O B 20000201 9:30:00      
A O B 20000205 9:30:01      
A T B 20000205 9:30:01 20000205 20000209 10004
A O S 20000209 9:30:01      
B O B 20000531 9:30:00      
B O S 20000643 9:30:00      
B T S 20000643 9:30:01 20000754 20000643 10010
B O B 20000754 9:30:01      
               

 

Contributor
Posts: 27

Re: Combining Files

Thanks so much.  But this does not work.  If you notice for ID = A at 9:30:00 SELLORNUM 20000199 is paired with two BUYORNUM  20000196 and 200197.  So this is not the desired output.


ChrisNZ wrote:

Like this?

data O;
input (ID        Type      BorS        ORNUM                 Time )(:$);
cards;
A           O           B              20000196             9:30:00
A            O           B              20000197             9:30:00
A            O           S              20000199              9:30:00
A            O           B              20000201              9:30:00
A            O           B               20000205             9:30:01
A            O           S              20000209             9:30:01
B             O           B               20000531            9:30:00
B             O            S              20000643            9:30:00
B              O           B               20000754           9:30:01
run;

data T;
input (ID         Type         BUYORNUM       SELLORNUM          TNUM         Time  )(:$);
cards;
A           T                20000196                 20000199         10001         9:30:00
A           T                 20000197                 20000199        10002         9:30:00
A          T                   20000205                20000209         10004        9:30:01
B          T                  20000754                  20000643         10010          9:30:01
run;

data WANT;
  set O;
  if _N_=1 then do;  
    dcl hash BUY(dataset:'T');
    BUY.definekey('BUYORNUM');
    BUY.definedata(all:'Y');
    BUY.definedone();
    dcl hash SELL(dataset:'T');
    SELL.definekey('SELLORNUM');
    SELL.definedata(all:'Y');
    SELL.definedone();
    if 0 then set T;
  end;    
  call missing( BUYORNUM, SELLORNUM, TNUM );
  output;
  RC=BUY.find(key:ORNUM);
  if RC=0 then do;
    output;
    RC=BUY.remove();
    RC=SELL.remove(key:SELLORNUM);
  end;
  else do;
    RC=SELL.find(key:ORNUM);
    if RC=0 then do;
      output;
      RC=SELL.remove();
      RC=BUY.remove(key:BUYORNUM);
    end;
  end;
  drop RC; 
run;

 

ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
AOB200001969:30:00   
ATB200001969:30:00200001962000019910001
AOB200001979:30:00   
ATB200001979:30:00200001972000019910002
AOS200001999:30:00   
AOB200002019:30:00   
AOB200002059:30:01   
ATB200002059:30:01200002052000020910004
AOS200002099:30:01   
BOB200005319:30:00   
BOS200006439:30:00   
BTS200006439:30:01200007542000064310010
BOB200007549:30:01   
BTB200007549:30:01200007542000064310010

 


 

Super User
Posts: 2,518

Re: Combining Files

Posted in reply to RandyStan

I don't get how that's wrong.

Contributor
Posts: 27

Re: Combining Files

Hi ChrisNZ:

   I get what your saying. 

  But what I posted is the sequential order of a trade book; and the sequencing is important.    For example if ORNUM 20000196 came in at 9:29:59 and is filled by BUYORNUM 20000199 the sequencing may be broken by your solution.

  Thank you so much.  I hope you can tweak the code to get the sequencing.

    Randy

Super User
Posts: 2,518

Re: Combining Files

[ Edited ]
Posted in reply to RandyStan
data O;
input (ID        Type      BorS        ORNUM                 Time )(:$);
cards;
A           O           B              20000196             9:29:59
A            O           B              20000197             9:30:00
A            O           S              20000199              9:30:00
A            O           B              20000201              9:30:00
A            O           B               20000205             9:30:01
A            O           S              20000209             9:30:01
B             O           B               20000531            9:30:00
B             O            S              20000643            9:30:00
B              O           B               20000754           9:30:01
run;

data T;
input (ID         Type         BUYORNUM       SELLORNUM          TNUM         Time  )(:$);
cards;
A           T                20000196                 20000199         10001         9:29:59
A           T                 20000197                 20000199        10002         9:30:00
A          T                   20000205                20000209         10004        9:30:01
B          T                  20000754                  20000643         10010          9:30:01
run;

data WANT;
  set O;
  if _N_=1 then do;  
    dcl hash BUY(dataset:'T');
    BUY.definekey('BUYORNUM');
    BUY.definedata(all:'Y');
    BUY.definedone();
    dcl hash SELL(dataset:'T');
    SELL.definekey('SELLORNUM');
    SELL.definedata(all:'Y');
    SELL.definedone();
    if 0 then set T;
  end;    
  call missing( BUYORNUM, SELLORNUM, TNUM );
  output;
  RC=BUY.find(key:ORNUM);
  if RC=0 then do;
    ORNUM='';
    output;
    RC=BUY.remove();
    RC=SELL.remove();
  end;
  else do;
    RC=SELL.find(key:ORNUM);
    if RC=0 then do;
      ORNUM='';
      output;
      RC=SELL.remove();
      RC=BUY.remove();
    end;
  end;
  drop RC; 
run;

What sequencing do you expect for 9:29:59?

ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
A O B 20000196 9:29:59      
A T B   9:29:59 20000196 20000199 10001
A O B 20000197 9:30:00      
A T B   9:30:00 20000197 20000199 10002
A O S 20000199 9:30:00      
A O B 20000201 9:30:00      
A O B 20000205 9:30:01      
A T B   9:30:01 20000205 20000209 10004
A O S 20000209 9:30:01      
B O B 20000531 9:30:00      
B O S 20000643 9:30:00      
B T S   9:30:01 20000754 20000643 10010
B O B 20000754 9:30:01    
Contributor
Posts: 27

Re: Combining Files

If that order comes in at 9:29:59 it transacts against the order 20000199 that comes in at 9:30:00 so the transaction time is 9:30:00 for TNUM 10001 not 9:29:59.  Moreover, since 20000196 cannot fill the entire order 20000199 it is also filled by ORNUM 2000197

  So the trade book should look like:

ID        Type      BorS        ORNUM                 Time            BUYORNUM       SELLORNUM          TNUM     

A           O           B              20000196             9:29:59

A            O           B              20000197             9:30:00

A            O           S              20000199              9:30:00

A           T                                                            9:30:00              20000196                 20000199         10001        

A           T                                                              9:30:00             20000197                 20000199        10002 

 

and not      

AOB200001969:29:59   
ATB 9:29:59200001962000019910001

 

  Once again my gratitude for all your help.

Super User
Posts: 2,518

Re: Combining Files

Posted in reply to RandyStan

Alright, it looks like we need the heavy guns.

Not efficient, but works.

data O;
input ( ID  TYPE  BORS  ORNUM  TIME )(:$);
cards;
A  O  B  20000196  9:29:59
A  O  B  20000197  9:30:00
A  O  S  20000199  9:30:00
A  O  B  20000201  9:30:00
A  O  B  20000205  9:30:01
A  O  S  20000209  9:30:01
B  O  B  20000531  9:30:00
B  O  S  20000643  9:30:00
B  O  B  20000754  9:30:01
run;

data T (index=(BUYORNUM  SELLORNUM READY4OUTPUT));
input ( ID  TYPE  BUYORNUM  SELLORNUM  TNUM  TIME )(:$) READY4OUTPUT ;
cards;
A  T  20000196  20000199  10001   9:30:00 0
A  T  20000197  20000199  10002   9:30:00 0
A  T  20000205  20000209  10004   9:30:01 0
B  T  20000754  20000643  10010   9:30:01 0
run;
                  
data WANT;
  set O;
  length BUYORNUM  SELLORNUM  TNUM  TIME $8 READY4OUTPUT 8;
  if BORS='B' then 
    CMD=catt('proc sql; update T set READY4OUTPUT=READY4OUTPUT+1 where BUYORNUM ="',ORNUM,'";quit;');
  if BORS='S' then 
    CMD=catt('proc sql; update T set READY4OUTPUT=READY4OUTPUT+1 where SELLORNUM="',ORNUM,'";quit;');
  RC=dosubl(CMD);
  output;  
  ORNUM=' ';
  DSID=open('T(where=(READY4OUTPUT=2))','is');       
  do until (RC ne 0); 
    call set(DSID);
    RC=fetch(DSID);                                
    if RC=0 then output;
  end;
  RC=close(DSID);
  RC=dosubl('proc sql; delete from T where READY4OUTPUT=2;quit;');
  drop READY4OUTPUT CMD RC DSID ;
run;

 

ID TYPE BORS ORNUM TIME BUYORNUM SELLORNUM TNUM
A O B 20000196 9:29:59      
A O B 20000197 9:30:00      
A O S 20000199 9:30:00      
A T S   9:30:00 20000196 20000199 10001
A T S   9:30:00 20000197 20000199 10002
A O B 20000201 9:30:00      
A O B 20000205 9:30:01      
A O S 20000209 9:30:01      
A T S   9:30:01 20000205 20000209 10004
B O B 20000531 9:30:00      
B O S 20000643 9:30:00      
B O B 20000754 9:30:01      
B T B   9:30:01 20000754 20000643 10010

 

 

 

Super User
Posts: 2,518

Re: Combining Files

Posted in reply to RandyStan

You can also sort afterward to obtain the desired order.

Contributor
Posts: 27

Re: Combining Files

I did think about that.  But by what Variable?

Super User
Posts: 10,860

Re: Combining Files

Posted in reply to RandyStan

It is really uneasy.

The difficulty is when to stop to check it from T.

 

data O;
input (ID        Type      BorS        ORNUM                 Time )(:$);
cards;
A           O           B              20000196             9:30:00
A            O           B              20000197             9:30:00
A            O           S              20000199              9:30:00
A            O           B              20000201              9:30:00
A            O           B               20000205             9:30:01
A            O           S              20000209             9:30:01
B             O           B               20000531            9:30:00
B             O            S              20000643            9:30:00
B              O           B               20000754           9:30:01
run;

data T;
input (ID         Type         BUYORNUM       SELLORNUM          TNUM         Time  )(:$);
cards;
A           T                20000196                 20000199         10001         9:30:00
A           T                 20000197                 20000199        10002         9:30:00
A          T                   20000205                20000209         10004        9:30:01
B          T                  20000754                  20000643         10010          9:30:01
run;

data WANT;
  if _N_=1 then do; 
    if 0 then set O;
    if 0 then set T;
    dcl hash h(dataset:'T');
    h.definekey('BUYORNUM','SELLORNUM');
    h.definedata(all:'Y');
    h.definedone();
  end;   
 set O;
 by id;
 array b{99} $ 32 _temporary_;
 if first.id then call missing(of b{*},SELLORNUM);
 output;
 if BorS ='B' then do;n+1;b{n}=ORNUM;end;
 if BorS ='S' then do;
  SELLORNUM=ORNUM;  
  do i=1 to n;  
    BUYORNUM=b{i};
    if h.find()=0 then do;call missing(BorS,ORNUM); output;h.remove();   n=0;end;
  end;
 end;
 if last.id then do;
  do i=1 to n;
    BUYORNUM=b{i};
    if h.find()=0 then do;call missing(BorS,ORNUM);output;h.remove();   n=0;end;
  end;
 end;
drop i n;
run;

proc print noobs;run;


Contributor
Posts: 62

Re: Combining Files

[ Edited ]
Posted in reply to RandyStan

Hi,
from @RandyStan original post it looks like a transaction cannot take place until there is both a buyer and seller ordernum present ,and it looks like the order numbers are cumulative. Assuming these to be true then should be able to match using the maximum order number in each row of the transaction data set.

 

Use the O & T datasets as starting points , but be sure that the ORNUM related variables are numeric.

 

Then

data WANT;
set O  T ;
if TYPE="T" then ORNUM=max(BUYORNUM,SELLORNUM);

proc sort data=WANT; by ORNUM ;
*clean up to remove the ordernum from transaction rows; data WANT; set WANT; if Type="T" then ORNUM=.; run;

 

Ask a Question
Discussion stats
  • 11 replies
  • 314 views
  • 0 likes
  • 4 in conversation