Help using Base SAS procedures

performance question?

Reply
Frequent Contributor
Posts: 110

performance question?

need to improve performance.

table1 = 700million obs

proc sort data=table1;

by id

run;

table2=100k obs

proc sort data=table2;

by id;

run;

data table1;

update table1 table2;

by id;

run;

q. how to improve performance?

Respected Advisor
Posts: 3,886

Re: performance question?

Assuming these are SAS tables and your 100k table fits into memory below code would be an option:

data m700;
  retain var1 var2 'm700';
  do keyvar=1 to 1000;
    output;
  end;
run;

data k100;
  retain var1 var2 'k100';
  do keyvar=2,5,10,100,999;
    output;
  end;
run;

data m700;
  if _n_=1 then
  do;
    dcl hash h1(dataset:'k100');
    h1.defineKey('keyvar');
    h1.defineData(all:'y');
    h1.defineDone();
  end;
  modify m700;
  _rc=h1.find();
run;

Super User
Posts: 9,662

Re: performance question?

Hash Table or proc format

Super User
Posts: 5,254

Re: performance question?

A more old school approach could be using MODIFY with the KEY= option (if feasible).

This eliminates the need to sort the master table, and the master table is not rewritten during update.

Data never sleeps
Super User
Posts: 9,662

Re: performance question?

I don't think it is a fast solution. Modify is only suited for the small table .

And the most important thing is that it is dangerous when your sas session stop accidently, the whole dataset will be collapsed totally.

Ksharp

Super User
Posts: 5,254

Re: performance question?

It's usually the fastest way to update a table, as long the master table is large, and the transaction table is relative small.

The risk for abnormal interruption has to weighted with work to restore a back-up (file system, or SAS PROC COPY style).

Data never sleeps
Super User
Posts: 9,662

Re: performance question?

Hi, LinusH

The way you proposed need to make a index firstly, that means you need to spend lots of time to make such an index, it is almost like re-order a large table , which will waste lots of time.

And this way will be very fast if the result of query is less than twenty percent of a table . but we can't sure it too.

Ksharp

Respected Advisor
Posts: 3,886

Re: performance question?

The OP asked for better performing code to update a 700M master table with 100k "transactions". My assumption would be that this is not a one-off task. Creating an index is a one-off task so it might be well worth it to create this index and avoid writting 700M records every single time you run an update.

Super User
Posts: 5,254

Re: performance question?

Nothing is certain...Smiley Happy

But i this example the ratio of the master and transaction table is 0,014%.

And the index creation is a one time job, not a daily as sort/recreation of the master table would be.

And if they are lucky, the index might be helpful for query as well.

Data never sleeps
Super User
Posts: 9,662

Re: performance question?

Hi. LinusH

20% I mean is not the ratio of master and tran .

I mean the number of updated observations in master table.

From the documentaion , if the result of query is less than 20% , using index is less efficient than consequent access method ( like SET ) .

Anyway , Anyhow , We can talk to each other right now, which lead me to wonder where are you (Patrick and LinusH) come from ?

I guess Australia ? Right ?

Nice to meet you and Patrick.

Ksharp

Super User
Posts: 5,254

Re: performance question?

I'm a swedeSmiley Wink

Data never sleeps
Respected Advisor
Posts: 3,886

Re: performance question?

Yes, I live in Australia since a couple of years.

Frequent Contributor
Posts: 110

Re: performance question?

https://communities.sas.com/thread/34628

This above link is the scenario. Out of 100k records 10 k ids are common in 700M and 100K. These 10K records from 100 (Table2) should be updated in table1(700M) and we need to append the remaining 90K records from table2 should be appended to table1.

The result should contain 700M and 90K.

Thanks

Super User
Posts: 9,662

Re: performance question?

I would like to use Hash Table + remove() .

Ksharp

Respected Advisor
Posts: 3,886

Re: performance question?

Again a strong reason IMHO to use Modify as Linus suggested as it allows you to combine insert/update in one go without re-writting the whole master table.

data m700;
  retain var1 var2 'm700';
  do keyvar=2 to 4,6 to 10;
    output;
  end;
run;

data k100;
  retain var1 var2 'k100';
  do keyvar=1,5,9;
    output;
  end;
run;

proc sql;
  create index keyvar on m700 (keyvar);
quit;

data m700;
  set k100(rename=(var1=kvar1 var2=kvar2));
  modify m700 key=keyvar;
  if _iorc_ = 0 then
  do;
    /* updates */
    var1=kvar1;
    var2=kvar2;
    replace;
  end;
  else
  if _iorc_>0 then
  do;
    /* inserts */
    var1=kvar1;
    var2=kvar2;
    output;
  end;
run;

Re: Stack database columns

Ask a Question
Discussion stats
  • 19 replies
  • 508 views
  • 0 likes
  • 4 in conversation