Hi All,
as update is too slow in SAS Proc SQL
we found a faster way to update big tables in a very fast way using hash and database views
it's known that:
1- if you update in database view you will update the physical table
2- hash update uses the memory (RAM) which might be slower if tables are very big.
our solution is based on creating a view for all condition you want in the update statement in the view like
simple solution:
PROC SQL;
UPDATE TABLE_A SET COLUMN_1 = {VALUE} WHERE condition_1 , condition_2
;QUIT;
our solution
create view as select column_1 from table_a where condition_1, condition_2
then use the simple hash update which uses less memory ,faster update
happy faster hash update
Abdelrahman Mahareek
AML Technical Consultant
Very interestiung. Can you tell us more? For example:
Which database and version?
Where are the view and hash table created?
How many records does the view point to?
Which database and version?
I tried this solution on Oracle and SQL server Databases
Where are the view and hash table created?
view is created on the database engine like
create view xx as select * from table_1 ; in Oracle
then i read this view in sas using libname oracle ...
you can update in this view and your update will reflect on the physical tables
How many records does the view point to?
i tried it on large table about 3 Milion
You should show your code so we can try it.
It has different techniques based on what you need to do
suppose you want to update join from table_1 column party_name from table_2 column customer_name
where party_number = customer_number
and party_date_of_birth is missing
First create view on table_1 say table_1_view on db say SQL server or Oracle
create view table_1_view as
SELECT party_number, party_name
From Table_1
where party_date_of_birth is null;
in SAS
Proc SQL;
create table work.table_1_view_update as
SELECT party_number , customer_name as party_name /* this alias for the hash */
FROM <DB_libname>.table_1_view a inner join <DB_libname>.table_2 b
on b.customer_number = a.party_number
;Quit;
now work.table_1_view has the update we need
let's use the hash to update
data <DB_libname>.table_1_view;
if _n_=1 then do;
declare hash ud(dataset:'work.table_1_view_update');
ud.defineKey('party_number');
ud.defineData('party_name');
ud.defineDone();
end;
modify <DB_libname>.table_1_view;
rcUpdate = ud.find();
if rcUpdate=0 then replace;
run;
i hope this helps
Personally I never find it that useful to update a table in place. SAS is not really designed to manage transactional databases.
How is this different than just creating an INDEX on the transaction table and using that instead of the HASH?
Also what about creating an INDEX on the master table instead of the transactions?
Does the HASH method run faster?
What are the limitations? It would seem to me that the HASH would limit updates to number that could fit in memory. That might not be much of a real issue. If you are applying more updates than you can loading into a HASH object then you are probably using the wrong method to maintain the data.
Ok I created a quick benchmark than anyone can run.
Your method is indeed faster than a non-indexed update. That's quite a process to have combined the modify statment and a hash table on RDBMS data. Kudos!
Below I test:
1-Table update in Oracle
2- View update in Oracle
3- View update with hash table in SAS
4- Table update in Oracle w/index
The index wins hands down, but when that's not an option, the hash table saves a lot of time.
*0- Sample data;
data ORALIB.TESTBASE
ORALIB.TESTUPDATE;
do I=1 to 1e5;
J=I*(ranuni(0) > 0.1); output ORALIB.TESTBASE;
J=I; output ORALIB.TESTUPDATE;
end;
run;
*Test 1 - Table update in Oracle - 60s;
proc sql;
connect using ORALIB;
execute by ORALIB(
update TESTBASE
set J = (select J from TESTUPDATE where TESTBASE.I=TESTUPDATE.I)
where TESTBASE.J=0
);
quit;
*Test 2 - View update in Oracle - 60s;
proc sql;
connect using ORALIB;
execute by ORALIB(
create view TESTVIEW as
select I,J from TESTBASE
where TESTBASE.J=0
);
execute by ORALIB(
update TESTVIEW
set J = (select J from TESTUPDATE where TESTVIEW.I=TESTUPDATE.I)
);
quit;
*Test 3 - View update with hash table in SAS - 18s;
proc sql;
create table UPD as
select u.*
from ORALIB.TESTVIEW v
inner join
ORALIB.TESTUPDATE u
on v.I=u.I;
quit;
data ORALIB.TESTVIEW;
if _n_=1 then do;
declare hash UPD(dataset:'UPD');
UPD.defineKey('I');
UPD.defineData('J');
UPD.defineDone();
end;
modify ORALIB.TESTVIEW;
RC = UPD.find();
if RC=0 then replace;
run;
*Test 4 - Table update in Oracle w/index - 1s;
proc sql;
connect using ORALIB;
execute by ORALIB(
create index I on TESTUPDATE(I)
);
execute by ORALIB(
update TESTBASE
set J = (select J from TESTUPDATE where TESTBASE.I=TESTUPDATE.I)
where TESTBASE.J=0
);
quit;
<Erroneous reply removed> Thank you. This is an interesting idea. Will look at it more closely tomorrow.
You should expand your subject line; it should read:
Update DBMS tables faster from SAS
as you are not updating native SAS tables.
thanks Kurt i've already updated topic name
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.