DATA Step, Macro, Functions and more

[PROBLEM]Merging with Indexed Dataset .

Accepted Solution Solved
Reply
Contributor
Posts: 46
Accepted Solution

[PROBLEM]Merging with Indexed Dataset .

I have two tables MASTER (30 million rows), TRANSACTION (300 rows).Before applying index, I was using the following code.

                data MASTER;

                     merge MASTER TRANSACTION;

                     by ID;

                run;

The code resulted in data from both the data set merged. (Thus containing in total 30 million rows).

 

 

 After Applying index to the MASTER on (ID and LOCATION) using

                      proc sql;

                                create index inde on MASTER(ID, LOCATION);

                      quit;

 

Now here lies the problem on how to merge. 

[PROBLEM] Now Merging them using index.

 

[I got this code.]

 

                               

                          data MASTER ;

                                    set TRANSACTION;

                                    set MASTER key = inde;

                                    if _IORC_ = 0 THEN MASTER;
                                    ELSE MASTER;

                         run;

 

But it is not resulting in the same merge as mentioned earlier.

 

Also if there is possibility for a solution on the above using simple index ?

 


Accepted Solutions
Solution
‎01-25-2017 12:17 AM
Trusted Advisor
Posts: 1,022

Re: [PROBLEM]Merging with Indexed Dataset .

[ Edited ]
Posted in reply to ankit___gupta

You want to replicate the results of your MERGE program with a modify program, thereby savings lots of input/output.

 

If there are variables in the transaction dataset not in the master dataset, they will be added to the vars in the master dataset. This cannot be replicated using modify.

 

But assuming there are no new vars in the transaction data set, then you could do something like this to replicate the merge:

 

data master;
  set transaction;
  modify master key=inde;
  set transaction (drop=recordcreation);
  if _iorc_=0 then replace;
  else output;
  call missing(of _all_);
run;

 

Notes:

  1. The reason for the second "set transaction" statement is because a successful "modify" statement would overwrite values from the first "set transaction" - the opposite of your merge program. Actually the first "set transaction" could be revised to
       set transaction (keep=id location)
    to make clear the purpose of each set statement .

  2. If there is a new record (i.e. new id/location) in dataset transaction and if a variable (say var SUPPLIER) is in master but not transaction, then supplier would inherit its value from the most recent successful modify statement, when it should have a missing value.  That's why there is a "call missing" statement.

  3. I should  add that the replication is accurate if there is only one record per index value in each dataset.

  4. The "drop=recordcreation" parameter was added to the 2nd "set transaction" to accomodate a revision to the requirement - namely to keep the value from master whenever possible, and take the value from transaction only when there is a new record.  Of course, this also breaks the property of exactly replicating the  results of the merge program.

View solution in original post


All Replies
Super User
Posts: 7,863

Re: [PROBLEM]Merging with Indexed Dataset .

Posted in reply to ankit___gupta

What is your intended result? 300 million observations or 300 observations?

 

Where did you get that code

data MASTER ;
set TRANSACTION;
set MASTER key = inde;
if _IORC_ = 0 THEN MASTER;
ELSE MASTER;
run;

from?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 46

Re: [PROBLEM]Merging with Indexed Dataset .

Posted in reply to KurtBremser

My Intended result is 300 million rows. The rows which are present in both should be overwritten in MASTER table. This should be accomplished using index.

Super Contributor
Posts: 474

Re: [PROBLEM]Merging with Indexed Dataset .

Posted in reply to ankit___gupta

Hi.

 

Not sure of what you are trying to do, but normally it should be something like this:

 

proc sql;
create index ID on TRANSACTION(ID); * simple index name should be the column name;
quit;

data MASTER ;
set MASTER ;
set TRANSACTION key = ID;
if _IORC_ = 0; * output when match;
run;

Index should be created for the smaller table, which I assume here is TRANSACTION.

 

Then you can use _IORC_ to do whatever you want (output or not), being 0 a match 1 a mismatch.

 

Of course if you do the merge via SQL, SAS will handle the index automatically for you (if possible).

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

Solution
‎01-25-2017 12:17 AM
Trusted Advisor
Posts: 1,022

Re: [PROBLEM]Merging with Indexed Dataset .

[ Edited ]
Posted in reply to ankit___gupta

You want to replicate the results of your MERGE program with a modify program, thereby savings lots of input/output.

 

If there are variables in the transaction dataset not in the master dataset, they will be added to the vars in the master dataset. This cannot be replicated using modify.

 

But assuming there are no new vars in the transaction data set, then you could do something like this to replicate the merge:

 

data master;
  set transaction;
  modify master key=inde;
  set transaction (drop=recordcreation);
  if _iorc_=0 then replace;
  else output;
  call missing(of _all_);
run;

 

Notes:

  1. The reason for the second "set transaction" statement is because a successful "modify" statement would overwrite values from the first "set transaction" - the opposite of your merge program. Actually the first "set transaction" could be revised to
       set transaction (keep=id location)
    to make clear the purpose of each set statement .

  2. If there is a new record (i.e. new id/location) in dataset transaction and if a variable (say var SUPPLIER) is in master but not transaction, then supplier would inherit its value from the most recent successful modify statement, when it should have a missing value.  That's why there is a "call missing" statement.

  3. I should  add that the replication is accurate if there is only one record per index value in each dataset.

  4. The "drop=recordcreation" parameter was added to the 2nd "set transaction" to accomodate a revision to the requirement - namely to keep the value from master whenever possible, and take the value from transaction only when there is a new record.  Of course, this also breaks the property of exactly replicating the  results of the merge program.
Contributor
Posts: 46

Re: [PROBLEM]Merging with Indexed Dataset .

Hi mkeintz

 

 

 

 

Trusted Advisor
Posts: 1,022

Re: [PROBLEM]Merging with Indexed Dataset .

Posted in reply to ankit___gupta

Then, in the second SET TRANSACTION, modify it to

    set transaction (drop=recordcreation)

 See my editted program in the prior topiic message.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 198 views
  • 1 like
  • 4 in conversation