<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Delta load in Data Integration Studio in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Delta-load-in-Data-Integration-Studio/m-p/967483#M376370</link>
    <description>&lt;P&gt;I guess there are two questions here:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1: Which approach will give you the best performance.&lt;/P&gt;
&lt;P&gt;2: How to set-up DIS to generate the desired code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If spde is the right choice depends on how you intend to use this big table later on. spde is certainly preferable if you've got later on queries with where clauses.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also consider spde a good choice for really big tables because you don't create these huge files that can create challenges for backup &amp;amp; recovery.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Assuming your delta load is update/insert only without delete:&lt;/P&gt;
&lt;P&gt;Ideally have both your master and transaction table physically sorted by primary key.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm not sure which options exactly you need to select in the DIS transformation but the code it generates should look similar to below sample. When I used DIS in the past I normally just tried the options that appeared to be right and then checked in the generated code if it created what I wanted it to be - or else changed "something" and checked again until I got what I wanted.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;libname spdewrk spde "%sysfunc(pathname(work))";

data spdewrk.master;
  set sashelp.class;
run;

data spdewrk.transactions;
  set sashelp.class;
  if name='Alfred' then age=15;
  if name in ('Henry','James','Jane','Janet','Mary','Philip','Robert','Ronald','Thomas') then delete;
  output;
  if name='Barbara' then 
    do;
      name='New';
      output;
    end;
run;

/* sorts not needed if you can be sure your tables are already sorted */
proc sort data=spdewrk.master presorted;
  by name;
run;

proc sort data=spdewrk.transactions;
  by name;
run;

/* upsert master with transaction table */
data spdewrk.master;
  modify spdewrk.master spdewrk.transactions updatemode=nomissingcheck;
  by name;
  if _iorc_=0 then replace;
  else output;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If there are also deletes:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That's something SAS is not that great at. In-place deletes in SAS tables are only logical which leads to the tables &lt;SPAN&gt;carrying&amp;nbsp;&lt;/SPAN&gt;more and more "dead wheight" . You only get rid of the logically deleted records if you re-create the table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And last but not least: Define chunks for your master spde table that are greater than the default.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 26 May 2025 08:26:57 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2025-05-26T08:26:57Z</dc:date>
    <item>
      <title>Delta load in Data Integration Studio</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delta-load-in-Data-Integration-Studio/m-p/967476#M376369</link>
      <description>&lt;P&gt;we are using SAS DIS (Data Integration Studio) on&amp;nbsp; a 9.4 M8 platform running on linux.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;We got problems handling delta update in SAS. I have stored the data in SPD. eg. I need to delta update a data set with 525.877.099 rows (base) with 5.881.292 rows (delta).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have tried to use update/insert by “Modify by index” but it only worked the first time for my deltaload and the second time with new delta load it came with error ERROR: No key variables have been defined for file XXX.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If I change the load to “Modify by columns” and the index key and index set ‘as it’ it runs forever or is stopped by broken pipe/network error. I stopped it after 15 hours run time. See log below.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What do you recommend for handle this type ? is SPD useless in this manner ? or should I choose different in the load transformation? Or should I avoid using SPD here?&lt;/P&gt;&lt;P&gt;ERROR: Error on server LIBNAME socket.&lt;BR /&gt;MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,814+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| 5881292| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.340000| _DISARM| 56842.870510| _DISARM| 2063800719.944049| _DISARM| 2063857562.814559| _DISARM| 35257.960000| _DISARM| | _ENDDISARM&lt;BR /&gt;ERROR: .&lt;BR /&gt;NOTE: Due to ERROR(s) above, SAS set option OBS=0, enabling syntax check mode.&lt;BR /&gt;This prevents execution of subsequent data modification statements.&lt;BR /&gt;ERROR: User asked for termination&lt;BR /&gt;NOTE: The SAS System stopped processing this step because of errors.&lt;BR /&gt;NOTE: There were 1 observations read from the data set XXX.&lt;BR /&gt;NOTE: The data set xxx has been updated. There were 769 observations rewritten, 0 observations&lt;BR /&gt;added and 0 observations deleted.&lt;BR /&gt;ERROR: Client Generic Error - SocketWrite 569&lt;BR /&gt;MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,824+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| 525877099| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.360000| _DISARM| 56842.886354| _DISARM| 2063800719.938458| _DISARM| 2063857562.824812| _DISARM| 35257.980000| _DISARM| | _ENDDISARM&lt;BR /&gt;ERROR: Connection refused.&lt;BR /&gt;NOTE: There were 770 observations read from the data set WORK.WG52IT.&lt;BR /&gt;MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,825+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| -1| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.350000| _DISARM| 56842.879640| _DISARM| 2063800719.945420| _DISARM| 2063857562.825060| _DISARM| 35257.970000| _DISARM| | _ENDDISARM&lt;BR /&gt;NOTE: There were 770 observations read from the data set WORK.WG52J3.&lt;BR /&gt;MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,825+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| -1| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.340000| _DISARM| 56842.877933| _DISARM| 2063800719.947373| _DISARM| 2063857562.825306| _DISARM| 35257.970000| _DISARM| | _ENDDISARM&lt;BR /&gt;PROCEDURE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,825+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| 20881408| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.360000| _DISARM| 56843.000551| _DISARM| 2063800719.824903| _DISARM| 2063857562.825454| _DISARM| 35257.980000| _DISARM| | _ENDDISARM&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 15:47:23.00&lt;BR /&gt;cpu time 9:47:43.36&lt;/P&gt;</description>
      <pubDate>Mon, 26 May 2025 06:38:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delta-load-in-Data-Integration-Studio/m-p/967476#M376369</guid>
      <dc:creator>AnnLyn</dc:creator>
      <dc:date>2025-05-26T06:38:44Z</dc:date>
    </item>
    <item>
      <title>Re: Delta load in Data Integration Studio</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delta-load-in-Data-Integration-Studio/m-p/967483#M376370</link>
      <description>&lt;P&gt;I guess there are two questions here:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1: Which approach will give you the best performance.&lt;/P&gt;
&lt;P&gt;2: How to set-up DIS to generate the desired code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If spde is the right choice depends on how you intend to use this big table later on. spde is certainly preferable if you've got later on queries with where clauses.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I also consider spde a good choice for really big tables because you don't create these huge files that can create challenges for backup &amp;amp; recovery.&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Assuming your delta load is update/insert only without delete:&lt;/P&gt;
&lt;P&gt;Ideally have both your master and transaction table physically sorted by primary key.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm not sure which options exactly you need to select in the DIS transformation but the code it generates should look similar to below sample. When I used DIS in the past I normally just tried the options that appeared to be right and then checked in the generated code if it created what I wanted it to be - or else changed "something" and checked again until I got what I wanted.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;libname spdewrk spde "%sysfunc(pathname(work))";

data spdewrk.master;
  set sashelp.class;
run;

data spdewrk.transactions;
  set sashelp.class;
  if name='Alfred' then age=15;
  if name in ('Henry','James','Jane','Janet','Mary','Philip','Robert','Ronald','Thomas') then delete;
  output;
  if name='Barbara' then 
    do;
      name='New';
      output;
    end;
run;

/* sorts not needed if you can be sure your tables are already sorted */
proc sort data=spdewrk.master presorted;
  by name;
run;

proc sort data=spdewrk.transactions;
  by name;
run;

/* upsert master with transaction table */
data spdewrk.master;
  modify spdewrk.master spdewrk.transactions updatemode=nomissingcheck;
  by name;
  if _iorc_=0 then replace;
  else output;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;If there are also deletes:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That's something SAS is not that great at. In-place deletes in SAS tables are only logical which leads to the tables &lt;SPAN&gt;carrying&amp;nbsp;&lt;/SPAN&gt;more and more "dead wheight" . You only get rid of the logically deleted records if you re-create the table.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And last but not least: Define chunks for your master spde table that are greater than the default.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 26 May 2025 08:26:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delta-load-in-Data-Integration-Studio/m-p/967483#M376370</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2025-05-26T08:26:57Z</dc:date>
    </item>
    <item>
      <title>Re: Delta load in Data Integration Studio</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Delta-load-in-Data-Integration-Studio/m-p/967502#M376371</link>
      <description>&lt;P&gt;Can you verify that you are using SPDE (you write "SPD" only), or perhaps SPDS?&lt;/P&gt;
&lt;P&gt;Generally speaking, those engines rock when it comes to inserts, modify/update not as much.&lt;/P&gt;
&lt;P&gt;When you have that many observations in your transaction data set, sometimes a brute force MERGE or JOIN could be quicker.&lt;/P&gt;</description>
      <pubDate>Mon, 26 May 2025 10:54:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Delta-load-in-Data-Integration-Studio/m-p/967502#M376371</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2025-05-26T10:54:24Z</dc:date>
    </item>
  </channel>
</rss>

