Hi SAS DI Studio experts,
I have a operational/transactional database that increases by about 10m rows a year
but changes of a few hundret thousand rows could happen on a single day.
Some rows inserted years ago could also be corrected. I need to find the DELTA using
SAS DI Studio to update my DWH (sas datasets). I cant load every everning the whole
Oracle database. Oracle has a feature called Materialized Views, those keeo
up-to-date by themselves if correctly configured but in SAS, I dont see how its done.
Any ideas?
Hi Philip
You need a method to identify changed rows in the source table(s); this can be one of several - I won't go into all the complexities, but broadly:
1. use change/update timestamps on the source rows (if your application provides them) to select - you'll also need to maintain or derive a "last DW extract" timestamp as your lower bound value for selection
2. use Oracle's CDC feature, which may need extra licensing and requires some setup by the DBA (see https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm#i1025409 for details); if you use this you should also be able to use DI Studio CDC transform, although I have to say I haven't used that myself
3. if your transactional system keeps a journal of transacted changes you could use that to build a 'delta for extract' set of keys to select against; again you'll need to time-boundary limit what you use from the journal
4. brute force comparison - extract the whole table and compare [which may be inefficient, as you can imagine]; either 'proc compare' or use MD5 checksum comparison
Effort for 1 or 2 is moderate and both can be reasonably efficient (although 1 is dependent on how well the selection is processed); effort for 3 is a bit more, but is also fairly efficient - but you'll need to avoid heterogeneous queries by either preparing your extract keyset in Oracle, or loading the keys to a temporary table otherwise you risk having SAS try to do all the work itself.
Effort for 4 is also a bit more and it can consume more resources, both run-time and disk storage - but remember that full-table extract can be less demand on the DBMS than a complex query so your full-table extract *might* place less load on the DBMS.
You should also think about if and how the changes will be recorded in the DW table(s) - do you need to preserve past history? ('type 2' change recording) or is your intention just to reduce data extract volume?
I hope this helps
Regards
Steve M
Hi Philip
You need a method to identify changed rows in the source table(s); this can be one of several - I won't go into all the complexities, but broadly:
1. use change/update timestamps on the source rows (if your application provides them) to select - you'll also need to maintain or derive a "last DW extract" timestamp as your lower bound value for selection
2. use Oracle's CDC feature, which may need extra licensing and requires some setup by the DBA (see https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm#i1025409 for details); if you use this you should also be able to use DI Studio CDC transform, although I have to say I haven't used that myself
3. if your transactional system keeps a journal of transacted changes you could use that to build a 'delta for extract' set of keys to select against; again you'll need to time-boundary limit what you use from the journal
4. brute force comparison - extract the whole table and compare [which may be inefficient, as you can imagine]; either 'proc compare' or use MD5 checksum comparison
Effort for 1 or 2 is moderate and both can be reasonably efficient (although 1 is dependent on how well the selection is processed); effort for 3 is a bit more, but is also fairly efficient - but you'll need to avoid heterogeneous queries by either preparing your extract keyset in Oracle, or loading the keys to a temporary table otherwise you risk having SAS try to do all the work itself.
Effort for 4 is also a bit more and it can consume more resources, both run-time and disk storage - but remember that full-table extract can be less demand on the DBMS than a complex query so your full-table extract *might* place less load on the DBMS.
You should also think about if and how the changes will be recorded in the DW table(s) - do you need to preserve past history? ('type 2' change recording) or is your intention just to reduce data extract volume?
I hope this helps
Regards
Steve M
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.