topic Re: Delta load in Data Integration Studio in SAS Programming

Delta load in Data Integration Studio

AnnLyn — Mon, 26 May 2025 06:38:44 GMT

we are using SAS DIS (Data Integration Studio) on a 9.4 M8 platform running on linux.

We got problems handling delta update in SAS. I have stored the data in SPD. eg. I need to delta update a data set with 525.877.099 rows (base) with 5.881.292 rows (delta).

I have tried to use update/insert by “Modify by index” but it only worked the first time for my deltaload and the second time with new delta load it came with error ERROR: No key variables have been defined for file XXX.

If I change the load to “Modify by columns” and the index key and index set ‘as it’ it runs forever or is stopped by broken pipe/network error. I stopped it after 15 hours run time. See log below.

What do you recommend for handle this type ? is SPD useless in this manner ? or should I choose different in the load transformation? Or should I avoid using SPD here?

Re: Delta load in Data Integration Studio

Patrick — Mon, 26 May 2025 08:26:57 GMT

I guess there are two questions here:

1: Which approach will give you the best performance.

2: How to set-up DIS to generate the desired code.

If spde is the right choice depends on how you intend to use this big table later on. spde is certainly preferable if you've got later on queries with where clauses.

I also consider spde a good choice for really big tables because you don't create these huge files that can create challenges for backup & recovery.

Assuming your delta load is update/insert only without delete:

Ideally have both your master and transaction table physically sorted by primary key.

I'm not sure which options exactly you need to select in the DIS transformation but the code it generates should look similar to below sample. When I used DIS in the past I normally just tried the options that appeared to be right and then checked in the generated code if it created what I wanted it to be - or else changed "something" and checked again until I got what I wanted.

libname spdewrk spde "%sysfunc(pathname(work))";

data spdewrk.master;
  set sashelp.class;
run;

data spdewrk.transactions;
  set sashelp.class;
  if name='Alfred' then age=15;
  if name in ('Henry','James','Jane','Janet','Mary','Philip','Robert','Ronald','Thomas') then delete;
  output;
  if name='Barbara' then 
    do;
      name='New';
      output;
    end;
run;

/* sorts not needed if you can be sure your tables are already sorted */
proc sort data=spdewrk.master presorted;
  by name;
run;

proc sort data=spdewrk.transactions;
  by name;
run;

/* upsert master with transaction table */
data spdewrk.master;
  modify spdewrk.master spdewrk.transactions updatemode=nomissingcheck;
  by name;
  if _iorc_=0 then replace;
  else output;
run;

If there are also deletes:

That's something SAS is not that great at. In-place deletes in SAS tables are only logical which leads to the tables carrying more and more "dead wheight" . You only get rid of the logically deleted records if you re-create the table.

And last but not least: Define chunks for your master spde table that are greater than the default.

Re: Delta load in Data Integration Studio

LinusH — Mon, 26 May 2025 10:54:24 GMT

Can you verify that you are using SPDE (you write "SPD" only), or perhaps SPDS?

Generally speaking, those engines rock when it comes to inserts, modify/update not as much.

When you have that many observations in your transaction data set, sometimes a brute force MERGE or JOIN could be quicker.