Hi,
I would like to know which is the best strategy in a ETL incremental load.
I have a CDC system that capture changes that I read, validate, transform and load in the target DB.
These changes belong to a period, all the rows that I treat correspond to a month (for example in Dec I treat November rows, and in January I will treat Decemeber rows).
For example in December, I am treating November rows and for the loading I have to optines:
1) In every ETL incremental load delete all rows of Nov in the target DB and insert the new rows.
- delete from targerdwb where sales_date >= 01/11/2015 and sales_date <= 30/11/2015
- insert CDC rows
2) In every incremental load I detect the rows that are news and the rows have to update.
In order to detect inserts or updates I have to make a join between rows to load and rows in the target DB.
Once detected inserts or updates (flag) I wil execute the SQL statements
I suppose the strategy depends in the volumen of the target DB and the volumen of rows in the CDC capture.
More or less the rows in the CDC are 20% at rhe beggining of the period and 5% at the end.
I will do performance tests to know the faster way of do the load, but I would like to know if there is any other thing
that will be usefull to take in consideration or if there are any other way to do the incremenrtal load.
Any advices about ETL incremental load will be usefull
Thanks in advance
... View more