Re: update SAS tables Faster!

Amahareek · Posted 08-06-2017 11:41 AM

Hi All,

as update is too slow in SAS Proc SQL

we found a faster way to update big tables in a very fast way using hash and database views

it's known that:

1- if you update in database view you will update the physical table

2- hash update uses the memory (RAM) which might be slower if tables are very big.

our solution is based on creating a view for all condition you want in the update statement in the view like

simple solution:

PROC SQL;

UPDATE TABLE_A SET COLUMN_1 = {VALUE} WHERE condition_1 , condition_2

;QUIT;

our solution

create view as select column_1 from table_a where condition_1, condition_2

then use the simple hash update which uses less memory ,faster update

happy faster hash update

Abdelrahman Mahareek

AML Technical Consultant

ChrisNZ · Posted 08-06-2017 10:12 PM

Very interestiung. Can you tell us more? For example:

Which database and version?

Where are the view and hash table created?

How many records does the view point to?

High-Performance SAS Coding - Third Edition

Amahareek · Posted 08-08-2017 02:16 AM

Which database and version?

I tried this solution on Oracle and SQL server Databases

Where are the view and hash table created?

view is created on the database engine like

create view xx as select * from table_1 ; in Oracle

then i read this view in sas using libname oracle ...

you can update in this view and your update will reflect on the physical tables

How many records does the view point to?

i tried it on large table about 3 Milion

Kurt_Bremser · Posted 08-07-2017 05:21 AM

You should show your code so we can try it.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Amahareek · Posted 08-08-2017 02:30 AM

It has different techniques based on what you need to do

suppose you want to update join from table_1 column party_name from table_2 column customer_name

where party_number = customer_number

and party_date_of_birth is missing

First create view on table_1 say table_1_view on db say SQL server or Oracle

create view table_1_view as

SELECT party_number, party_name

From Table_1

where party_date_of_birth is null;

in SAS

Proc SQL;

create table work.table_1_view_update as

SELECT party_number , customer_name as party_name /* this alias for the hash */

FROM <DB_libname>.table_1_view a inner join <DB_libname>.table_2 b

on b.customer_number = a.party_number

;Quit;

now work.table_1_view has the update we need

let's use the hash to update

data <DB_libname>.table_1_view;
if _n_=1 then do;
declare hash ud(dataset:'work.table_1_view_update');
ud.defineKey('party_number');
ud.defineData('party_name');
ud.defineDone();
end;
modify <DB_libname>.table_1_view;
rcUpdate = ud.find();
if rcUpdate=0 then replace;
run;

i hope this helps

Tom · Posted 08-08-2017 11:14 AM

Personally I never find it that useful to update a table in place. SAS is not really designed to manage transactional databases.

How is this different than just creating an INDEX on the transaction table and using that instead of the HASH?

Also what about creating an INDEX on the master table instead of the transactions?

Does the HASH method run faster?

What are the limitations? It would seem to me that the HASH would limit updates to number that could fit in memory. That might not be much of a real issue. If you are applying more updates than you can loading into a HASH object then you are probably using the wrong method to maintain the data.

ChrisNZ · Posted 08-08-2017 07:04 PM

Ok I created a quick benchmark than anyone can run.

Your method is indeed faster than a non-indexed update. That's quite a process to have combined the modify statment and a hash table on RDBMS data. Kudos!

Below I test:

1-Table update in Oracle

2- View update in Oracle

3- View update with hash table in SAS

4- Table update in Oracle w/index

The index wins hands down, but when that's not an option, the hash table saves a lot of time.

*0- Sample data;
data ORALIB.TESTBASE
     ORALIB.TESTUPDATE;
  do I=1 to 1e5;
    J=I*(ranuni(0) > 0.1); output ORALIB.TESTBASE;
    J=I;                   output ORALIB.TESTUPDATE;
  end;
run;
  

*Test 1 - Table update in Oracle - 60s; 
proc sql; 
  connect using ORALIB;
  execute by ORALIB(
    update TESTBASE
    set J = (select J from TESTUPDATE where TESTBASE.I=TESTUPDATE.I)
    where TESTBASE.J=0
  ); 
quit;


*Test 2 - View update in Oracle - 60s; 
proc sql; 
  connect using ORALIB;
  execute by ORALIB(
    create view TESTVIEW as 
    select I,J from TESTBASE 
    where TESTBASE.J=0
  ); 
  execute by ORALIB(
    update TESTVIEW
    set J = (select J from TESTUPDATE where TESTVIEW.I=TESTUPDATE.I)
  ); 
quit;


*Test 3 - View update with hash table in SAS - 18s; 
proc sql;
  create table UPD as 
  select u.* 
  from ORALIB.TESTVIEW   v
         inner join
       ORALIB.TESTUPDATE u
         on v.I=u.I;
quit;

data ORALIB.TESTVIEW;
  if _n_=1 then do;
    declare hash UPD(dataset:'UPD');
    UPD.defineKey('I');
    UPD.defineData('J');
    UPD.defineDone();
  end;
  modify ORALIB.TESTVIEW;
  RC = UPD.find();
  if RC=0 then replace;
run;


*Test 4 - Table update in Oracle w/index - 1s; 
proc sql; 
  connect using ORALIB;
  execute by ORALIB(
    create index I on TESTUPDATE(I)
  );
  execute by ORALIB(
    update TESTBASE
    set J = (select J from TESTUPDATE where TESTBASE.I=TESTUPDATE.I)
    where TESTBASE.J=0
  ); 
quit;

High-Performance SAS Coding - Third Edition