need to improve performance.
table1 = 700million obs
proc sort data=table1;
by id
run;
table2=100k obs
proc sort data=table2;
by id;
run;
data table1;
update table1 table2;
by id;
run;
q. how to improve performance?
Assuming these are SAS tables and your 100k table fits into memory below code would be an option:
data m700;
retain var1 var2 'm700';
do keyvar=1 to 1000;
output;
end;
run;
data k100;
retain var1 var2 'k100';
do keyvar=2,5,10,100,999;
output;
end;
run;
data m700;
if _n_=1 then
do;
dcl hash h1(dataset:'k100');
h1.defineKey('keyvar');
h1.defineData(all:'y');
h1.defineDone();
end;
modify m700;
_rc=h1.find();
run;
Hash Table or proc format
A more old school approach could be using MODIFY with the KEY= option (if feasible).
This eliminates the need to sort the master table, and the master table is not rewritten during update.
I don't think it is a fast solution. Modify is only suited for the small table .
And the most important thing is that it is dangerous when your sas session stop accidently, the whole dataset will be collapsed totally.
Ksharp
It's usually the fastest way to update a table, as long the master table is large, and the transaction table is relative small.
The risk for abnormal interruption has to weighted with work to restore a back-up (file system, or SAS PROC COPY style).
Hi, LinusH
The way you proposed need to make a index firstly, that means you need to spend lots of time to make such an index, it is almost like re-order a large table , which will waste lots of time.
And this way will be very fast if the result of query is less than twenty percent of a table . but we can't sure it too.
Ksharp
The OP asked for better performing code to update a 700M master table with 100k "transactions". My assumption would be that this is not a one-off task. Creating an index is a one-off task so it might be well worth it to create this index and avoid writting 700M records every single time you run an update.
Nothing is certain...
But i this example the ratio of the master and transaction table is 0,014%.
And the index creation is a one time job, not a daily as sort/recreation of the master table would be.
And if they are lucky, the index might be helpful for query as well.
Hi. LinusH
20% I mean is not the ratio of master and tran .
I mean the number of updated observations in master table.
From the documentaion , if the result of query is less than 20% , using index is less efficient than consequent access method ( like SET ) .
Anyway , Anyhow , We can talk to each other right now, which lead me to wonder where are you (Patrick and LinusH) come from ?
I guess Australia ? Right ?
Nice to meet you and Patrick.
Ksharp
I'm a swede
Yes, I live in Australia since a couple of years.
https://communities.sas.com/thread/34628
This above link is the scenario. Out of 100k records 10 k ids are common in 700M and 100K. These 10K records from 100 (Table2) should be updated in table1(700M) and we need to append the remaining 90K records from table2 should be appended to table1.
The result should contain 700M and 90K.
Thanks
I would like to use Hash Table + remove() .
Ksharp
Again a strong reason IMHO to use Modify as Linus suggested as it allows you to combine insert/update in one go without re-writting the whole master table.
data m700;
retain var1 var2 'm700';
do keyvar=2 to 4,6 to 10;
output;
end;
run;
data k100;
retain var1 var2 'k100';
do keyvar=1,5,9;
output;
end;
run;
proc sql;
create index keyvar on m700 (keyvar);
quit;
data m700;
set k100(rename=(var1=kvar1 var2=kvar2));
modify m700 key=keyvar;
if _iorc_ = 0 then
do;
/* updates */
var1=kvar1;
var2=kvar2;
replace;
end;
else
if _iorc_>0 then
do;
/* inserts */
var1=kvar1;
var2=kvar2;
output;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.