In most data warehouses it is a common practice to include 2 datetime columns in every table which are Sys_create_date and sys_update_date. When a record is inserted for the first time its Sys_create_date and sys_update_date will be the same and it will be sysdatetime() of when the record is inserted. When a record gets updated, its Sys_create_date will not change but its sys_update_date will be updated with sysdatetime() of when the record is updated.
To achieve this function we need to know , in the source file , which records are inserts and which are updates. For this we need to do a lookup on the target table. In DI 3.4 since one object cannot be put twice on the canvas, we had to come up with a workaround of creating a temp_target table , just for the lookup.
In DI 4.2 , I believe, we can achieve this. But do you think there will be some performance in doing a lookup(or some sql join) of the target(esp when target is huge)?
This is a common practice in all DWH and hence I’m thinking does SAS give an easy solution to do this? Table Loader anyway identifies what to insert and what to update. So is there a way we can do this here?
Is I understand it, you store data with no aspect of keeping historical values, but wish to keep track on when the record was originally created, and last update time? In this, I think you have to extract the Sys_create_date from the existing target table. This could be done in a previous job, if don't wish have duplicate metadata objects for the same physical table. If your target table is indexed properly, this should not be a big performance issue, unless if you update a considerable part of the data.
A different approach is to keep track of the historical values in your DW, which use to be considered as best practice. By doing so, the SCD Type 2 Loader will do "everything" for you... The result is a target table with multiple entries for your business key (optional the surrogate key as well) with a time interval that specifies when in time this record is considered as valid. The current valid record has a high-date, the default for DI Studio is 01Jan5999.