BookmarkSubscribeRSS Feed
sassharp
Calcite | Level 5

need to improve performance.

table1 = 700million obs

proc sort data=table1;

by id

run;

table2=100k obs

proc sort data=table2;

by id;

run;

data table1;

update table1 table2;

by id;

run;

q. how to improve performance?

19 REPLIES 19
Patrick
Opal | Level 21

Assuming these are SAS tables and your 100k table fits into memory below code would be an option:

data m700;
  retain var1 var2 'm700';
  do keyvar=1 to 1000;
    output;
  end;
run;

data k100;
  retain var1 var2 'k100';
  do keyvar=2,5,10,100,999;
    output;
  end;
run;

data m700;
  if _n_=1 then
  do;
    dcl hash h1(dataset:'k100');
    h1.defineKey('keyvar');
    h1.defineData(all:'y');
    h1.defineDone();
  end;
  modify m700;
  _rc=h1.find();
run;

Ksharp
Super User

Hash Table or proc format

LinusH
Tourmaline | Level 20

A more old school approach could be using MODIFY with the KEY= option (if feasible).

This eliminates the need to sort the master table, and the master table is not rewritten during update.

Data never sleeps
Ksharp
Super User

I don't think it is a fast solution. Modify is only suited for the small table .

And the most important thing is that it is dangerous when your sas session stop accidently, the whole dataset will be collapsed totally.

Ksharp

LinusH
Tourmaline | Level 20

It's usually the fastest way to update a table, as long the master table is large, and the transaction table is relative small.

The risk for abnormal interruption has to weighted with work to restore a back-up (file system, or SAS PROC COPY style).

Data never sleeps
Ksharp
Super User

Hi, LinusH

The way you proposed need to make a index firstly, that means you need to spend lots of time to make such an index, it is almost like re-order a large table , which will waste lots of time.

And this way will be very fast if the result of query is less than twenty percent of a table . but we can't sure it too.

Ksharp

Patrick
Opal | Level 21

The OP asked for better performing code to update a 700M master table with 100k "transactions". My assumption would be that this is not a one-off task. Creating an index is a one-off task so it might be well worth it to create this index and avoid writting 700M records every single time you run an update.

LinusH
Tourmaline | Level 20

Nothing is certain...Smiley Happy

But i this example the ratio of the master and transaction table is 0,014%.

And the index creation is a one time job, not a daily as sort/recreation of the master table would be.

And if they are lucky, the index might be helpful for query as well.

Data never sleeps
Ksharp
Super User

Hi. LinusH

20% I mean is not the ratio of master and tran .

I mean the number of updated observations in master table.

From the documentaion , if the result of query is less than 20% , using index is less efficient than consequent access method ( like SET ) .

Anyway , Anyhow , We can talk to each other right now, which lead me to wonder where are you (Patrick and LinusH) come from ?

I guess Australia ? Right ?

Nice to meet you and Patrick.

Ksharp

LinusH
Tourmaline | Level 20

I'm a swedeSmiley Wink

Data never sleeps
Patrick
Opal | Level 21

Yes, I live in Australia since a couple of years.

sassharp
Calcite | Level 5

https://communities.sas.com/thread/34628

This above link is the scenario. Out of 100k records 10 k ids are common in 700M and 100K. These 10K records from 100 (Table2) should be updated in table1(700M) and we need to append the remaining 90K records from table2 should be appended to table1.

The result should contain 700M and 90K.

Thanks

Ksharp
Super User

I would like to use Hash Table + remove() .

Ksharp

Patrick
Opal | Level 21

Again a strong reason IMHO to use Modify as Linus suggested as it allows you to combine insert/update in one go without re-writting the whole master table.

data m700;
  retain var1 var2 'm700';
  do keyvar=2 to 4,6 to 10;
    output;
  end;
run;

data k100;
  retain var1 var2 'k100';
  do keyvar=1,5,9;
    output;
  end;
run;

proc sql;
  create index keyvar on m700 (keyvar);
quit;

data m700;
  set k100(rename=(var1=kvar1 var2=kvar2));
  modify m700 key=keyvar;
  if _iorc_ = 0 then
  do;
    /* updates */
    var1=kvar1;
    var2=kvar2;
    replace;
  end;
  else
  if _iorc_>0 then
  do;
    /* inserts */
    var1=kvar1;
    var2=kvar2;
    output;
  end;
run;

Re: Stack database columns

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 1447 views
  • 0 likes
  • 4 in conversation