Re: performance question?

sassharp · Posted 04-24-2012 11:49 PM

need to improve performance.

table1 = 700million obs

proc sort data=table1;

by id

run;

table2=100k obs

proc sort data=table2;

by id;

run;

data table1;

update table1 table2;

by id;

run;

q. how to improve performance?

Patrick · Posted 04-25-2012 12:47 AM

Assuming these are SAS tables and your 100k table fits into memory below code would be an option:

data m700;
retain var1 var2 'm700';
do keyvar=1 to 1000;
output;
end;
run;

data k100;
retain var1 var2 'k100';
do keyvar=2,5,10,100,999;
output;
end;
run;

data m700;
if _n_=1 then
do;
    dcl hash h1(dataset:'k100');
    h1.defineKey('keyvar');
    h1.defineData(all:'y');
    h1.defineDone();
end;
modify m700;
_rc=h1.find();
run;

Ksharp · Posted 04-25-2012 01:37 AM

Hash Table or proc format

LinusH · Posted 04-25-2012 03:34 AM

A more old school approach could be using MODIFY with the KEY= option (if feasible).

This eliminates the need to sort the master table, and the master table is not rewritten during update.

Data never sleeps

Ksharp · Posted 04-25-2012 03:52 AM

I don't think it is a fast solution. Modify is only suited for the small table .

And the most important thing is that it is dangerous when your sas session stop accidently, the whole dataset will be collapsed totally.

Ksharp

LinusH · Posted 04-25-2012 04:01 AM

It's usually the fastest way to update a table, as long the master table is large, and the transaction table is relative small.

The risk for abnormal interruption has to weighted with work to restore a back-up (file system, or SAS PROC COPY style).

Data never sleeps

Ksharp · Posted 04-25-2012 04:27 AM

Hi, LinusH

The way you proposed need to make a index firstly, that means you need to spend lots of time to make such an index, it is almost like re-order a large table , which will waste lots of time.

And this way will be very fast if the result of query is less than twenty percent of a table . but we can't sure it too.

Ksharp

Patrick · Posted 04-25-2012 04:35 AM

The OP asked for better performing code to update a 700M master table with 100k "transactions". My assumption would be that this is not a one-off task. Creating an index is a one-off task so it might be well worth it to create this index and avoid writting 700M records every single time you run an update.

LinusH · Posted 04-25-2012 04:36 AM

Nothing is certain...

But i this example the ratio of the master and transaction table is 0,014%.

And the index creation is a one time job, not a daily as sort/recreation of the master table would be.

And if they are lucky, the index might be helpful for query as well.

Data never sleeps

Ksharp · Posted 04-25-2012 04:50 AM

Hi. LinusH

20% I mean is not the ratio of master and tran .

I mean the number of updated observations in master table.

From the documentaion , if the result of query is less than 20% , using index is less efficient than consequent access method ( like SET ) .

Anyway , Anyhow , We can talk to each other right now, which lead me to wonder where are you (Patrick and LinusH) come from ?

I guess Australia ? Right ?

Nice to meet you and Patrick.

Ksharp

LinusH · Posted 04-25-2012 05:04 AM

I'm a swede

Data never sleeps

Patrick · Posted 04-25-2012 05:38 AM

Yes, I live in Australia since a couple of years.

sassharp · Posted 04-25-2012 08:30 AM

https://communities.sas.com/thread/34628

This above link is the scenario. Out of 100k records 10 k ids are common in 700M and 100K. These 10K records from 100 (Table2) should be updated in table1(700M) and we need to append the remaining 90K records from table2 should be appended to table1.

The result should contain 700M and 90K.

Thanks

Ksharp · Posted 04-25-2012 10:14 PM

I would like to use Hash Table + remove() .

Ksharp

Patrick · Posted 04-26-2012 12:42 AM

Again a strong reason IMHO to use Modify as Linus suggested as it allows you to combine insert/update in one go without re-writting the whole master table.

data m700;
retain var1 var2 'm700';
do keyvar=2 to 4,6 to 10;
output;
end;
run;

data k100;
retain var1 var2 'k100';
do keyvar=1,5,9;
output;
end;
run;

proc sql;
create index keyvar on m700 (keyvar);
quit;

data m700;
set k100(rename=(var1=kvar1 var2=kvar2));
modify m700 key=keyvar;
if _iorc_ = 0 then
do;
    /* updates */
    var1=kvar1;
    var2=kvar2;
    replace;
end;
else
if _iorc_>0 then
do;
    /* inserts */
    var1=kvar1;
    var2=kvar2;
    output;
end;
run;

Re: Stack database columns

Registration is open

SAS Training: Just a Click Away