BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PhilipH
Quartz | Level 8

Hi SAS DI Studio experts,

I have a operational/transactional database that increases by about 10m rows a year
but changes of a few hundret thousand rows could happen on a single day.
Some rows inserted years ago could also be corrected. I need to find the DELTA using
SAS DI Studio to update my DWH (sas datasets). I cant load every everning the whole
Oracle database. Oracle has a feature called Materialized Views, those keeo
up-to-date by themselves if correctly configured but in SAS, I dont see how its done.


Any ideas?

1 ACCEPTED SOLUTION

Accepted Solutions
SteveM_UK
Obsidian | Level 7

Hi Philip

 

You need a method to identify changed rows in the source table(s); this can be one of several - I won't go into all the complexities, but broadly:

1. use change/update timestamps on the source rows (if your application provides them) to select - you'll also need to maintain or derive a "last DW extract" timestamp as your lower bound value for selection

2. use Oracle's CDC feature, which may need extra licensing and requires some setup by the DBA (see https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm#i1025409 for details); if you use this you should also be able to use DI Studio CDC transform, although I have to say I haven't used that myself

3. if your transactional system keeps a journal of transacted changes you could use that to build a 'delta for extract' set of keys to select against; again you'll need to time-boundary limit what you use from the journal

4. brute force comparison - extract the whole table and compare [which may be inefficient, as you can imagine]; either 'proc compare' or use MD5 checksum comparison

 

Effort for 1 or 2 is moderate and both can be reasonably efficient (although 1 is dependent on how well the selection is processed); effort for 3 is a bit more, but is also fairly efficient - but you'll need to avoid heterogeneous queries by either preparing your extract keyset in Oracle, or loading the keys to a temporary table otherwise you risk having SAS try to do all the work itself.

Effort for 4 is also a bit more and it can consume more resources, both run-time and disk storage - but remember that full-table extract can be less demand on the DBMS than a complex query so your full-table extract *might* place less load on the DBMS.

 

You should also think about if and how the changes will be recorded in the DW table(s) - do you need to preserve past history? ('type 2' change recording) or is your intention just to reduce data extract volume?

 

I hope this helps

 

Regards

Steve M

View solution in original post

1 REPLY 1
SteveM_UK
Obsidian | Level 7

Hi Philip

 

You need a method to identify changed rows in the source table(s); this can be one of several - I won't go into all the complexities, but broadly:

1. use change/update timestamps on the source rows (if your application provides them) to select - you'll also need to maintain or derive a "last DW extract" timestamp as your lower bound value for selection

2. use Oracle's CDC feature, which may need extra licensing and requires some setup by the DBA (see https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm#i1025409 for details); if you use this you should also be able to use DI Studio CDC transform, although I have to say I haven't used that myself

3. if your transactional system keeps a journal of transacted changes you could use that to build a 'delta for extract' set of keys to select against; again you'll need to time-boundary limit what you use from the journal

4. brute force comparison - extract the whole table and compare [which may be inefficient, as you can imagine]; either 'proc compare' or use MD5 checksum comparison

 

Effort for 1 or 2 is moderate and both can be reasonably efficient (although 1 is dependent on how well the selection is processed); effort for 3 is a bit more, but is also fairly efficient - but you'll need to avoid heterogeneous queries by either preparing your extract keyset in Oracle, or loading the keys to a temporary table otherwise you risk having SAS try to do all the work itself.

Effort for 4 is also a bit more and it can consume more resources, both run-time and disk storage - but remember that full-table extract can be less demand on the DBMS than a complex query so your full-table extract *might* place less load on the DBMS.

 

You should also think about if and how the changes will be recorded in the DW table(s) - do you need to preserve past history? ('type 2' change recording) or is your intention just to reduce data extract volume?

 

I hope this helps

 

Regards

Steve M

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2225 views
  • 4 likes
  • 2 in conversation