Solved: SAS DI Studio and delta load

PhilipH · Posted 05-31-2016 10:00 AM

Hi SAS DI Studio experts,

I have a operational/transactional database that increases by about 10m rows a year
but changes of a few hundret thousand rows could happen on a single day.
Some rows inserted years ago could also be corrected. I need to find the DELTA using
SAS DI Studio to update my DWH (sas datasets). I cant load every everning the whole
Oracle database. Oracle has a feature called Materialized Views, those keeo
up-to-date by themselves if correctly configured but in SAS, I dont see how its done.

Any ideas?

SteveM_UK · Posted 05-31-2016 10:55 AM

Hi Philip

You need a method to identify changed rows in the source table(s); this can be one of several - I won't go into all the complexities, but broadly:

1. use change/update timestamps on the source rows (if your application provides them) to select - you'll also need to maintain or derive a "last DW extract" timestamp as your lower bound value for selection

2. use Oracle's CDC feature, which may need extra licensing and requires some setup by the DBA (see https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm#i1025409 for details); if you use this you should also be able to use DI Studio CDC transform, although I have to say I haven't used that myself

3. if your transactional system keeps a journal of transacted changes you could use that to build a 'delta for extract' set of keys to select against; again you'll need to time-boundary limit what you use from the journal

4. brute force comparison - extract the whole table and compare [which may be inefficient, as you can imagine]; either 'proc compare' or use MD5 checksum comparison

Effort for 1 or 2 is moderate and both can be reasonably efficient (although 1 is dependent on how well the selection is processed); effort for 3 is a bit more, but is also fairly efficient - but you'll need to avoid heterogeneous queries by either preparing your extract keyset in Oracle, or loading the keys to a temporary table otherwise you risk having SAS try to do all the work itself.

Effort for 4 is also a bit more and it can consume more resources, both run-time and disk storage - but remember that full-table extract can be less demand on the DBMS than a complex query so your full-table extract *might* place less load on the DBMS.

You should also think about if and how the changes will be recorded in the DW table(s) - do you need to preserve past history? ('type 2' change recording) or is your intention just to reduce data extract volume?

I hope this helps

Regards

Steve M

View solution in original post

SteveM_UK · Posted 05-31-2016 10:55 AM