<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: ETL incremental load in SAS Data Management</title>
    <link>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241030#M6263</link>
    <description>&lt;P&gt;Sounds good.&lt;/P&gt;
&lt;P&gt;And I would&amp;nbsp;suggest that you keep the old versions of the updated records, since I guess that you do&amp;nbsp;some reporting of the November records in all through December?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;CDC would make this easy for you. Records marked as updates should first update the current record in the target table as "closed",&amp;nbsp;preferably with the date/datetime -1 for the update record (this columns is usually named valid_to_dt/dttm). What update method to chose depends on your target RDBMS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then just insert both&amp;nbsp;inserts and update records. Use the CDC timestamp as valid_from_dt/dttm. For valid_to_dttm use a high date/datetime.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you map your CDC source to the target kinda 1-1, there should be no need to use the SCD Type Loader, CDC have already&amp;nbsp;figured out what and how to update.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 28 Dec 2015 17:19:57 GMT</pubDate>
    <dc:creator>LinusH</dc:creator>
    <dc:date>2015-12-28T17:19:57Z</dc:date>
    <item>
      <title>ETL incremental load</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241008#M6260</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would like to know which is the best strategy in a ETL incremental load.&lt;/P&gt;
&lt;P&gt;I have a CDC system that capture changes that I read, validate, transform and load in the&amp;nbsp; target DB.&lt;/P&gt;
&lt;P&gt;These changes belong to a period, all the rows that I treat correspond to a month (for example in Dec I treat November rows, and in January I will treat Decemeber rows).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For example in December, I am treating November rows and&amp;nbsp; for the loading I have to optines:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1) In every ETL incremental load delete all rows of Nov in the target DB and insert the new rows.&lt;/P&gt;
&lt;P&gt;- delete from targerdwb where sales_date &amp;gt;= 01/11/2015 and sales_date &amp;lt;= 30/11/2015&lt;/P&gt;
&lt;P&gt;- insert CDC rows&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2) In every incremental load I detect the rows that are news and the rows have to update.&lt;/P&gt;
&lt;P&gt;In order to detect inserts or updates I have to make a join between rows to load and rows in the target DB.&lt;/P&gt;
&lt;P&gt;Once detected inserts or updates (flag)&amp;nbsp; I wil execute the SQL statements&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suppose the strategy depends in the volumen of the target DB and the volumen of rows in the CDC capture.&lt;/P&gt;
&lt;P&gt;More or less the rows in the CDC are 20% at rhe beggining of the period and 5% at the end.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will do performance tests to know the faster way of do the load, but I would like to know if there is any other thing&lt;/P&gt;
&lt;P&gt;that will be usefull to take in consideration or if there are any other way to do the incremenrtal load.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any advices about ETL incremental load will be usefull&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks in advance&lt;/P&gt;</description>
      <pubDate>Mon, 28 Dec 2015 12:31:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241008#M6260</guid>
      <dc:creator>juanvg1972</dc:creator>
      <dc:date>2015-12-28T12:31:53Z</dc:date>
    </item>
    <item>
      <title>Re: ETL incremental load</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241024#M6261</link>
      <description>&lt;P&gt;Nice design issue!&lt;/P&gt;
&lt;P&gt;But before giving any advice, just a few classification questions.&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;What is the nature of your CDC records? Do they represent tranasactions, or more presistent type of data (customer, account etc)?&lt;/LI&gt;
&lt;LI&gt;What is the nature of your target DB? Is it a detail DW layer? What RDBMS are you using?&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;Then I'm not really following you on the 20 to 5%. Period means month, right? Do you mean that you load November multiple times in December?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From a strict&amp;nbsp;performance perspective, I would&amp;nbsp;partition the target&amp;nbsp;table by month so you can drop it with almost no performance cost.&lt;/P&gt;
&lt;P&gt;On the other hand, if data for a business key changes throughout the month, you perhaps should store all versions for an business key. If so, your are looking at a SCD Type 2 load style.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Dec 2015 16:51:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241024#M6261</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2015-12-28T16:51:26Z</dc:date>
    </item>
    <item>
      <title>Re: ETL incremental load</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241027#M6262</link>
      <description>&lt;P&gt;Thanks Linush from your answer&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the CDC records I have transactions, in my target table is a detail DW &amp;nbsp;layer partitioned by month&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My CDC records are 20% of the target table in the beggigng of december (my firsts etl process of november data)&lt;/P&gt;
&lt;P&gt;and 5% at the end of december ( my lasts etl process of november data)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think the best strategy to load the DW layer is insert/update, previsoly &amp;nbsp;have to mark 'updates records' and 'insert records'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have otther DW table that is aggregated from DW detailed layer. I think I will load it from the DW detailed layer, no from CDC records..., becuase to load aggegrated table from CDC records can be dificult...&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any advice will be greatly appreciated&lt;/P&gt;</description>
      <pubDate>Mon, 28 Dec 2015 17:04:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241027#M6262</guid>
      <dc:creator>juanvg1972</dc:creator>
      <dc:date>2015-12-28T17:04:04Z</dc:date>
    </item>
    <item>
      <title>Re: ETL incremental load</title>
      <link>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241030#M6263</link>
      <description>&lt;P&gt;Sounds good.&lt;/P&gt;
&lt;P&gt;And I would&amp;nbsp;suggest that you keep the old versions of the updated records, since I guess that you do&amp;nbsp;some reporting of the November records in all through December?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;CDC would make this easy for you. Records marked as updates should first update the current record in the target table as "closed",&amp;nbsp;preferably with the date/datetime -1 for the update record (this columns is usually named valid_to_dt/dttm). What update method to chose depends on your target RDBMS.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then just insert both&amp;nbsp;inserts and update records. Use the CDC timestamp as valid_from_dt/dttm. For valid_to_dttm use a high date/datetime.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you map your CDC source to the target kinda 1-1, there should be no need to use the SCD Type Loader, CDC have already&amp;nbsp;figured out what and how to update.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 28 Dec 2015 17:19:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Data-Management/ETL-incremental-load/m-p/241030#M6263</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2015-12-28T17:19:57Z</dc:date>
    </item>
  </channel>
</rss>

