BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
PhilipH
Quartz | Level 8

Hi SAS DI Studio experts,

I have a operational/transactional database that increases by about 10m rows a year
but changes of a few hundret thousand rows could happen on a single day.
Some rows inserted years ago could also be corrected. I need to find the DELTA using
SAS DI Studio to update my DWH (sas datasets). I cant load every everning the whole
Oracle database. Oracle has a feature called Materialized Views, those keeo
up-to-date by themselves if correctly configured but in SAS, I dont see how its done.


Any ideas?

1 ACCEPTED SOLUTION

Accepted Solutions
SteveM_UK
Obsidian | Level 7

Hi Philip

 

You need a method to identify changed rows in the source table(s); this can be one of several - I won't go into all the complexities, but broadly:

1. use change/update timestamps on the source rows (if your application provides them) to select - you'll also need to maintain or derive a "last DW extract" timestamp as your lower bound value for selection

2. use Oracle's CDC feature, which may need extra licensing and requires some setup by the DBA (see https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm#i1025409 for details); if you use this you should also be able to use DI Studio CDC transform, although I have to say I haven't used that myself

3. if your transactional system keeps a journal of transacted changes you could use that to build a 'delta for extract' set of keys to select against; again you'll need to time-boundary limit what you use from the journal

4. brute force comparison - extract the whole table and compare [which may be inefficient, as you can imagine]; either 'proc compare' or use MD5 checksum comparison

 

Effort for 1 or 2 is moderate and both can be reasonably efficient (although 1 is dependent on how well the selection is processed); effort for 3 is a bit more, but is also fairly efficient - but you'll need to avoid heterogeneous queries by either preparing your extract keyset in Oracle, or loading the keys to a temporary table otherwise you risk having SAS try to do all the work itself.

Effort for 4 is also a bit more and it can consume more resources, both run-time and disk storage - but remember that full-table extract can be less demand on the DBMS than a complex query so your full-table extract *might* place less load on the DBMS.

 

You should also think about if and how the changes will be recorded in the DW table(s) - do you need to preserve past history? ('type 2' change recording) or is your intention just to reduce data extract volume?

 

I hope this helps

 

Regards

Steve M

View solution in original post

1 REPLY 1
SteveM_UK
Obsidian | Level 7

Hi Philip

 

You need a method to identify changed rows in the source table(s); this can be one of several - I won't go into all the complexities, but broadly:

1. use change/update timestamps on the source rows (if your application provides them) to select - you'll also need to maintain or derive a "last DW extract" timestamp as your lower bound value for selection

2. use Oracle's CDC feature, which may need extra licensing and requires some setup by the DBA (see https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm#i1025409 for details); if you use this you should also be able to use DI Studio CDC transform, although I have to say I haven't used that myself

3. if your transactional system keeps a journal of transacted changes you could use that to build a 'delta for extract' set of keys to select against; again you'll need to time-boundary limit what you use from the journal

4. brute force comparison - extract the whole table and compare [which may be inefficient, as you can imagine]; either 'proc compare' or use MD5 checksum comparison

 

Effort for 1 or 2 is moderate and both can be reasonably efficient (although 1 is dependent on how well the selection is processed); effort for 3 is a bit more, but is also fairly efficient - but you'll need to avoid heterogeneous queries by either preparing your extract keyset in Oracle, or loading the keys to a temporary table otherwise you risk having SAS try to do all the work itself.

Effort for 4 is also a bit more and it can consume more resources, both run-time and disk storage - but remember that full-table extract can be less demand on the DBMS than a complex query so your full-table extract *might* place less load on the DBMS.

 

You should also think about if and how the changes will be recorded in the DW table(s) - do you need to preserve past history? ('type 2' change recording) or is your intention just to reduce data extract volume?

 

I hope this helps

 

Regards

Steve M

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1847 views
  • 4 likes
  • 2 in conversation