Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

Accepted Solution Solved
Reply
Contributor
Posts: 53
Accepted Solution

Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

Hello,

 

I have a requirement to update certain fields in multiple tables in an Oracle DWH. These tables have billions of records and the operation is Update.

 

There are 2 methods we are looking at -

1. Load data into Oracle via SAS , then perform the update within Oracle using Oracle updates and/or triggers.

2. Perform all necessary updates within  SAS dataset(s) using SAS programming and bulk-load the final dataset into Oracle.

 

Versions being used: Oracle 11g, SAS 9.2

 

These are my major considerations -

 

-> Data volume (Billions, multipe tables)

-> Search and update duration, since update is based on composite keys and sometimes whole records (about 300+ fields)

-> Any I/O (esp for method 2)

-> Match-merge techniques (I understand Oracle updates/joins are different from the way SAS does its match-merge)

-> Table indices in Oracle/SAS (which one would be faster)

 

I am looking for recommendations, pros, cons  etc. for either of the methods from experienced users in the community. I am leaning toward approach 1.

 

Thanks!!


Accepted Solutions
Solution
‎08-10-2016 12:21 PM
Contributor
Posts: 53

Re: Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

There is no one solution to this question. But I went with doing all the transformation in SAS, doing a bulk load and then doing updates in place on the DB somtimes using the SAS Modify and sometime using PROC SQL.

 

I used modify when I could afford(in terms of performance) to carry the same number of variables in both master and transaction datasets, PROC SQL when I had to update only 2 fields out of say 300. 

 

Bulk load presented its own problems - all constraints are dropped before doing a bulk load, including primary key - so we would know we messed up only while trying to rebuild indexes or partitions. 

 

I am not sure if this is going to work well but we'll find out a couple of months down the line. I might update this post then.

 

I am marking this post as the solution, just because this post needs to be closed.

View solution in original post


All Replies
Super User
Posts: 3,113

Re: Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

Where is your source data coming from originally? Oracle, just SAS or somewhere else?

 

If its coming from Oracle originally then it probably makes sense to do your updates in Oracle using data loaded from SAS into Oracle temporary tables.

Contributor
Posts: 53

Re: Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

Thanks, it is coming from Oracle but another DB. So -

 

Oracle DB1 ------> SAS Extract, Transform and Load to --------> Oracle DB2

 

Oracle DB2 will use sequence numbers from a master table as surrogate key and data in other tables needs to matched and assigned these sequence numbers.

 

And like I said, volume is in billions Smiley Happy

Super User
Posts: 5,260

Re: Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

Conclude in such decisions need to be done at the site - there so many things to consider, data, competence, current it strategies and of course political.
Generally speaking I think SAS has better update mechanisms such as data step with modify. But on the other hand, if you find a solution that copies the data back and forth to Oracle, one can wonder why you should use Oracle at all..
So the most logical would be to try to update the Oracle table in place. If you are using DI Studio play around with its default code generation and see how that transforms to Oracle SQL. Of course test to see if you can use movement of the transaction table to a Oracle temporary table prior to the update.
What kind of update do we talk about? What is the update ratio?
In a DW I would look for a way to only do inserts - this table sounds so large that it really wouldn't be a type 2 dimension...?
A way to avoid explicit updates (in a type 2 scenario) is to use the bi-temporal functionality in Oracle.
Data never sleeps
Contributor
Posts: 53

Re: Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

[ Edited ]

Do you think a datastep with modify is better handling a billion records than an update in Oracle SQL? I am asking because I might just do it in SAS then..

 

Right now, my tools are fixed. Oracle DWH and SAS V9..  The tables I am looking to update are actually fact tables. The update will happen on the surrogate key column. Updates will be about a few millions, same with inserts. The update step will still be reading a billion records. I don't think there will be any indices built within the DB..

 

I know this is a very high level description of my particular case.But I am trying to see if anyone has encountered a similar environment in the past.

 

Thanks for all your pointers, I will look into them.

Super User
Posts: 5,260

Re: Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

I think your best way forward is to do a proof of concept where you test 2-3 different concepts and compare the results (and not only from a performance point of view, consider also flexibility maintenance and local IT strategies).
Data never sleeps
Solution
‎08-10-2016 12:21 PM
Contributor
Posts: 53

Re: Recommendation needed: Oracle Trigger/Updates vs SAS update methods (Data in billions)

There is no one solution to this question. But I went with doing all the transformation in SAS, doing a bulk load and then doing updates in place on the DB somtimes using the SAS Modify and sometime using PROC SQL.

 

I used modify when I could afford(in terms of performance) to carry the same number of variables in both master and transaction datasets, PROC SQL when I had to update only 2 fields out of say 300. 

 

Bulk load presented its own problems - all constraints are dropped before doing a bulk load, including primary key - so we would know we messed up only while trying to rebuild indexes or partitions. 

 

I am not sure if this is going to work well but we'll find out a couple of months down the line. I might update this post then.

 

I am marking this post as the solution, just because this post needs to be closed.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 420 views
  • 0 likes
  • 3 in conversation