I have two tables MASTER (30 million rows), TRANSACTION (300 rows).Before applying index, I was using the following code.
data MASTER;
merge MASTER TRANSACTION;
by ID;
run;
The code resulted in data from both the data set merged. (Thus containing in total 30 million rows).
After Applying index to the MASTER on (ID and LOCATION) using
proc sql;
create index inde on MASTER(ID, LOCATION);
quit;
Now here lies the problem on how to merge.
[PROBLEM] Now Merging them using index.
[I got this code.]
data MASTER ;
set TRANSACTION;
set MASTER key = inde;
if _IORC_ = 0 THEN MASTER;
ELSE MASTER;
run;
But it is not resulting in the same merge as mentioned earlier.
Also if there is possibility for a solution on the above using simple index ?
You want to replicate the results of your MERGE program with a modify program, thereby savings lots of input/output.
If there are variables in the transaction dataset not in the master dataset, they will be added to the vars in the master dataset. This cannot be replicated using modify.
But assuming there are no new vars in the transaction data set, then you could do something like this to replicate the merge:
data master;
set transaction;
modify master key=inde;
set transaction (drop=recordcreation);
if _iorc_=0 then replace;
else output;
call missing(of _all_);
run;
Notes:
What is your intended result? 300 million observations or 300 observations?
Where did you get that code
data MASTER ;
set TRANSACTION;
set MASTER key = inde;
if _IORC_ = 0 THEN MASTER;
ELSE MASTER;
run;
from?
My Intended result is 300 million rows. The rows which are present in both should be overwritten in MASTER table. This should be accomplished using index.
Hi.
Not sure of what you are trying to do, but normally it should be something like this:
proc sql;
create index ID on TRANSACTION(ID); * simple index name should be the column name;
quit;
data MASTER ;
set MASTER ;
set TRANSACTION key = ID;
if _IORC_ = 0; * output when match;
run;
Index should be created for the smaller table, which I assume here is TRANSACTION.
Then you can use _IORC_ to do whatever you want (output or not), being 0 a match 1 a mismatch.
Of course if you do the merge via SQL, SAS will handle the index automatically for you (if possible).
Hope it helps.
Daniel Santos @ www.cgd.pt
You want to replicate the results of your MERGE program with a modify program, thereby savings lots of input/output.
If there are variables in the transaction dataset not in the master dataset, they will be added to the vars in the master dataset. This cannot be replicated using modify.
But assuming there are no new vars in the transaction data set, then you could do something like this to replicate the merge:
data master;
set transaction;
modify master key=inde;
set transaction (drop=recordcreation);
if _iorc_=0 then replace;
else output;
call missing(of _all_);
run;
Notes:
Hi mkeintz,
your reply was totally accurate. I have just one other question to the query.
Currently it is replacing all the column in MASTER from TRANSACTION.
But i have one particular column (column name : RecordCreation) in Master and TRANSACTION that needs to be retained in MASTER and replace the other column.Also if the data was present only in transaction then put the entire to MASTER.
Please help with the above.
Thanks in Advance.
Then, in the second SET TRANSACTION, modify it to
set transaction (drop=recordcreation)
See my editted program in the prior topiic message.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.