BookmarkSubscribeRSS Feed
RandyStan
Fluorite | Level 6

I have two Files: A and B.  I want to merge/combine  them in a particular way.  They are in SAS format

File A

ID        Type      BorS        ORNUM                 Time

A           O           B              20000196             9:30:00

A            O           B              20000197             9:30:00

A            O           S              20000199              9:30:00

A            O           B              20000201              9:30:00

A            O           B               20000205             9:30:01

A            O           S              20000209             9:30:01

B             O           B               20000531            9:30:00

B             O            S              20000643            9:30:00

B              O           B               20000754           9:30:01

 

File B is as Follows

 

ID         Type         BUYORNUM       SELLORNUM          TNUM         Time

A           T                20000196                 20000199         10001         9:30:00

A           T                 20000197                 20000199        10002         9:30:00

A          T                   20000205                20000209         10004        9:30:01

B          T                  20000754                  20000643         10010          9:30:01

 

 

The merged file should look like this.  The logic is that if the ORNUM matches either the BUYORNUM or the SELLORNUM then type ‘T’ is inserted into the combined file

ID        Type      BorS        ORNUM                 Time                  BUYORNUM            SELLORNUM             TNUM        

A           O           B              20000196             9:30:00

A            O           B              20000197             9:30:00

A            O           S              20000199              9:30:00

A           T                                                              9:30:00              20000196                 20000199         10001        

A           T                                                              9:30:00               20000197                 20000199        10002       

A            O           B              20000201              9:30:00

A            O           B               20000205             9:30:01

A            O           S              20000209             9:30:01

A           T                                                             9:30:01                  20000205              20000209         10004      

B             O           B               20000531            9:30:00

B             O            S              20000643            9:30:00

B              O           B               20000754           9:30:01

B               T                                                          9:30:01                    20000754                  20000643           10010     

 

Thanks in Advance    

11 REPLIES 11
ChrisNZ
Tourmaline | Level 20

Like this?

data O;
input (ID        Type      BorS        ORNUM                 Time )(:$);
cards;
A           O           B              20000196             9:30:00
A            O           B              20000197             9:30:00
A            O           S              20000199              9:30:00
A            O           B              20000201              9:30:00
A            O           B               20000205             9:30:01
A            O           S              20000209             9:30:01
B             O           B               20000531            9:30:00
B             O            S              20000643            9:30:00
B              O           B               20000754           9:30:01
run;

data T;
input (ID         Type         BUYORNUM       SELLORNUM          TNUM         Time  )(:$);
cards;
A           T                20000196                 20000199         10001         9:30:00
A           T                 20000197                 20000199        10002         9:30:00
A          T                   20000205                20000209         10004        9:30:01
B          T                  20000754                  20000643         10010          9:30:01
run;

data WANT;
  set O;
  if _N_=1 then do;  
    dcl hash BUY(dataset:'T');
    BUY.definekey('BUYORNUM');
    BUY.definedata(all:'Y');
    BUY.definedone();
    dcl hash SELL(dataset:'T');
    SELL.definekey('SELLORNUM');
    SELL.definedata(all:'Y');
    SELL.definedone();
    if 0 then set T;
  end;    
  call missing( BUYORNUM, SELLORNUM, TNUM );
  output;
  RC=BUY.find(key:ORNUM);
  if RC=0 then do;
    output;
    RC=BUY.remove();
    RC=SELL.remove(key:SELLORNUM);
  end;
  else do;
    RC=SELL.find(key:ORNUM);
    if RC=0 then do;
      output;
      RC=SELL.remove();
      RC=BUY.remove(key:BUYORNUM);
    end;
  end;
  drop RC; 
run;

 

ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
A O B 20000196 9:30:00      
A T B 20000196 9:30:00 20000196 20000199 10001
A O B 20000197 9:30:00      
A T B 20000197 9:30:00 20000197 20000199 10002
A O S 20000199 9:30:00      
A O B 20000201 9:30:00      
A O B 20000205 9:30:01      
A T B 20000205 9:30:01 20000205 20000209 10004
A O S 20000209 9:30:01      
B O B 20000531 9:30:00      
B O S 20000643 9:30:00      
B T S 20000643 9:30:01 20000754 20000643 10010
B O B 20000754 9:30:01      
               

 

RandyStan
Fluorite | Level 6

Thanks so much.  But this does not work.  If you notice for ID = A at 9:30:00 SELLORNUM 20000199 is paired with two BUYORNUM  20000196 and 200197.  So this is not the desired output.


@ChrisNZ wrote:

Like this?

data O;
input (ID        Type      BorS        ORNUM                 Time )(:$);
cards;
A           O           B              20000196             9:30:00
A            O           B              20000197             9:30:00
A            O           S              20000199              9:30:00
A            O           B              20000201              9:30:00
A            O           B               20000205             9:30:01
A            O           S              20000209             9:30:01
B             O           B               20000531            9:30:00
B             O            S              20000643            9:30:00
B              O           B               20000754           9:30:01
run;

data T;
input (ID         Type         BUYORNUM       SELLORNUM          TNUM         Time  )(:$);
cards;
A           T                20000196                 20000199         10001         9:30:00
A           T                 20000197                 20000199        10002         9:30:00
A          T                   20000205                20000209         10004        9:30:01
B          T                  20000754                  20000643         10010          9:30:01
run;

data WANT;
  set O;
  if _N_=1 then do;  
    dcl hash BUY(dataset:'T');
    BUY.definekey('BUYORNUM');
    BUY.definedata(all:'Y');
    BUY.definedone();
    dcl hash SELL(dataset:'T');
    SELL.definekey('SELLORNUM');
    SELL.definedata(all:'Y');
    SELL.definedone();
    if 0 then set T;
  end;    
  call missing( BUYORNUM, SELLORNUM, TNUM );
  output;
  RC=BUY.find(key:ORNUM);
  if RC=0 then do;
    output;
    RC=BUY.remove();
    RC=SELL.remove(key:SELLORNUM);
  end;
  else do;
    RC=SELL.find(key:ORNUM);
    if RC=0 then do;
      output;
      RC=SELL.remove();
      RC=BUY.remove(key:BUYORNUM);
    end;
  end;
  drop RC; 
run;

 

ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
AOB200001969:30:00   
ATB200001969:30:00200001962000019910001
AOB200001979:30:00   
ATB200001979:30:00200001972000019910002
AOS200001999:30:00   
AOB200002019:30:00   
AOB200002059:30:01   
ATB200002059:30:01200002052000020910004
AOS200002099:30:01   
BOB200005319:30:00   
BOS200006439:30:00   
BTS200006439:30:01200007542000064310010
BOB200007549:30:01   
BTB200007549:30:01200007542000064310010

 


 

ChrisNZ
Tourmaline | Level 20
RandyStan
Fluorite | Level 6

Hi ChrisNZ:

   I get what your saying. 

  But what I posted is the sequential order of a trade book; and the sequencing is important.    For example if ORNUM 20000196 came in at 9:29:59 and is filled by BUYORNUM 20000199 the sequencing may be broken by your solution.

  Thank you so much.  I hope you can tweak the code to get the sequencing.

    Randy

ChrisNZ
Tourmaline | Level 20
data O;
input (ID        Type      BorS        ORNUM                 Time )(:$);
cards;
A           O           B              20000196             9:29:59
A            O           B              20000197             9:30:00
A            O           S              20000199              9:30:00
A            O           B              20000201              9:30:00
A            O           B               20000205             9:30:01
A            O           S              20000209             9:30:01
B             O           B               20000531            9:30:00
B             O            S              20000643            9:30:00
B              O           B               20000754           9:30:01
run;

data T;
input (ID         Type         BUYORNUM       SELLORNUM          TNUM         Time  )(:$);
cards;
A           T                20000196                 20000199         10001         9:29:59
A           T                 20000197                 20000199        10002         9:30:00
A          T                   20000205                20000209         10004        9:30:01
B          T                  20000754                  20000643         10010          9:30:01
run;

data WANT;
  set O;
  if _N_=1 then do;  
    dcl hash BUY(dataset:'T');
    BUY.definekey('BUYORNUM');
    BUY.definedata(all:'Y');
    BUY.definedone();
    dcl hash SELL(dataset:'T');
    SELL.definekey('SELLORNUM');
    SELL.definedata(all:'Y');
    SELL.definedone();
    if 0 then set T;
  end;    
  call missing( BUYORNUM, SELLORNUM, TNUM );
  output;
  RC=BUY.find(key:ORNUM);
  if RC=0 then do;
    ORNUM='';
    output;
    RC=BUY.remove();
    RC=SELL.remove();
  end;
  else do;
    RC=SELL.find(key:ORNUM);
    if RC=0 then do;
      ORNUM='';
      output;
      RC=SELL.remove();
      RC=BUY.remove();
    end;
  end;
  drop RC; 
run;

What sequencing do you expect for 9:29:59?

ID Type BorS ORNUM Time BUYORNUM SELLORNUM TNUM
A O B 20000196 9:29:59      
A T B   9:29:59 20000196 20000199 10001
A O B 20000197 9:30:00      
A T B   9:30:00 20000197 20000199 10002
A O S 20000199 9:30:00      
A O B 20000201 9:30:00      
A O B 20000205 9:30:01      
A T B   9:30:01 20000205 20000209 10004
A O S 20000209 9:30:01      
B O B 20000531 9:30:00      
B O S 20000643 9:30:00      
B T S   9:30:01 20000754 20000643 10010
B O B 20000754 9:30:01    
RandyStan
Fluorite | Level 6

If that order comes in at 9:29:59 it transacts against the order 20000199 that comes in at 9:30:00 so the transaction time is 9:30:00 for TNUM 10001 not 9:29:59.  Moreover, since 20000196 cannot fill the entire order 20000199 it is also filled by ORNUM 2000197

  So the trade book should look like:

ID        Type      BorS        ORNUM                 Time            BUYORNUM       SELLORNUM          TNUM     

A           O           B              20000196             9:29:59

A            O           B              20000197             9:30:00

A            O           S              20000199              9:30:00

A           T                                                            9:30:00              20000196                 20000199         10001        

A           T                                                              9:30:00             20000197                 20000199        10002 

 

and not      

AOB200001969:29:59   
ATB 9:29:59200001962000019910001

 

  Once again my gratitude for all your help.

ChrisNZ
Tourmaline | Level 20

Alright, it looks like we need the heavy guns.

Not efficient, but works.

data O;
input ( ID  TYPE  BORS  ORNUM  TIME )(:$);
cards;
A  O  B  20000196  9:29:59
A  O  B  20000197  9:30:00
A  O  S  20000199  9:30:00
A  O  B  20000201  9:30:00
A  O  B  20000205  9:30:01
A  O  S  20000209  9:30:01
B  O  B  20000531  9:30:00
B  O  S  20000643  9:30:00
B  O  B  20000754  9:30:01
run;

data T (index=(BUYORNUM  SELLORNUM READY4OUTPUT));
input ( ID  TYPE  BUYORNUM  SELLORNUM  TNUM  TIME )(:$) READY4OUTPUT ;
cards;
A  T  20000196  20000199  10001   9:30:00 0
A  T  20000197  20000199  10002   9:30:00 0
A  T  20000205  20000209  10004   9:30:01 0
B  T  20000754  20000643  10010   9:30:01 0
run;
                  
data WANT;
  set O;
  length BUYORNUM  SELLORNUM  TNUM  TIME $8 READY4OUTPUT 8;
  if BORS='B' then 
    CMD=catt('proc sql; update T set READY4OUTPUT=READY4OUTPUT+1 where BUYORNUM ="',ORNUM,'";quit;');
  if BORS='S' then 
    CMD=catt('proc sql; update T set READY4OUTPUT=READY4OUTPUT+1 where SELLORNUM="',ORNUM,'";quit;');
  RC=dosubl(CMD);
  output;  
  ORNUM=' ';
  DSID=open('T(where=(READY4OUTPUT=2))','is');       
  do until (RC ne 0); 
    call set(DSID);
    RC=fetch(DSID);                                
    if RC=0 then output;
  end;
  RC=close(DSID);
  RC=dosubl('proc sql; delete from T where READY4OUTPUT=2;quit;');
  drop READY4OUTPUT CMD RC DSID ;
run;

 

ID TYPE BORS ORNUM TIME BUYORNUM SELLORNUM TNUM
A O B 20000196 9:29:59      
A O B 20000197 9:30:00      
A O S 20000199 9:30:00      
A T S   9:30:00 20000196 20000199 10001
A T S   9:30:00 20000197 20000199 10002
A O B 20000201 9:30:00      
A O B 20000205 9:30:01      
A O S 20000209 9:30:01      
A T S   9:30:01 20000205 20000209 10004
B O B 20000531 9:30:00      
B O S 20000643 9:30:00      
B O B 20000754 9:30:01      
B T B   9:30:01 20000754 20000643 10010

 

 

 

ChrisNZ
Tourmaline | Level 20

You can also sort afterward to obtain the desired order.

RandyStan
Fluorite | Level 6

I did think about that.  But by what Variable?

Ksharp
Super User

It is really uneasy.

The difficulty is when to stop to check it from T.

 

data O;
input (ID        Type      BorS        ORNUM                 Time )(:$);
cards;
A           O           B              20000196             9:30:00
A            O           B              20000197             9:30:00
A            O           S              20000199              9:30:00
A            O           B              20000201              9:30:00
A            O           B               20000205             9:30:01
A            O           S              20000209             9:30:01
B             O           B               20000531            9:30:00
B             O            S              20000643            9:30:00
B              O           B               20000754           9:30:01
run;

data T;
input (ID         Type         BUYORNUM       SELLORNUM          TNUM         Time  )(:$);
cards;
A           T                20000196                 20000199         10001         9:30:00
A           T                 20000197                 20000199        10002         9:30:00
A          T                   20000205                20000209         10004        9:30:01
B          T                  20000754                  20000643         10010          9:30:01
run;

data WANT;
  if _N_=1 then do; 
    if 0 then set O;
    if 0 then set T;
    dcl hash h(dataset:'T');
    h.definekey('BUYORNUM','SELLORNUM');
    h.definedata(all:'Y');
    h.definedone();
  end;   
 set O;
 by id;
 array b{99} $ 32 _temporary_;
 if first.id then call missing(of b{*},SELLORNUM);
 output;
 if BorS ='B' then do;n+1;b{n}=ORNUM;end;
 if BorS ='S' then do;
  SELLORNUM=ORNUM;  
  do i=1 to n;  
    BUYORNUM=b{i};
    if h.find()=0 then do;call missing(BorS,ORNUM); output;h.remove();   n=0;end;
  end;
 end;
 if last.id then do;
  do i=1 to n;
    BUYORNUM=b{i};
    if h.find()=0 then do;call missing(BorS,ORNUM);output;h.remove();   n=0;end;
  end;
 end;
drop i n;
run;

proc print noobs;run;


JohnHoughton
Quartz | Level 8

Hi,
from @RandyStan original post it looks like a transaction cannot take place until there is both a buyer and seller ordernum present ,and it looks like the order numbers are cumulative. Assuming these to be true then should be able to match using the maximum order number in each row of the transaction data set.

 

Use the O & T datasets as starting points , but be sure that the ORNUM related variables are numeric.

 

Then

data WANT;
set O  T ;
if TYPE="T" then ORNUM=max(BUYORNUM,SELLORNUM);

proc sort data=WANT; by ORNUM ;
*clean up to remove the ordernum from transaction rows; data WANT; set WANT; if Type="T" then ORNUM=.; run;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1043 views
  • 0 likes
  • 4 in conversation