we are using SAS DIS (Data Integration Studio) on a 9.4 M8 platform running on linux.
We got problems handling delta update in SAS. I have stored the data in SPD. eg. I need to delta update a data set with 525.877.099 rows (base) with 5.881.292 rows (delta).
I have tried to use update/insert by “Modify by index” but it only worked the first time for my deltaload and the second time with new delta load it came with error ERROR: No key variables have been defined for file XXX.
If I change the load to “Modify by columns” and the index key and index set ‘as it’ it runs forever or is stopped by broken pipe/network error. I stopped it after 15 hours run time. See log below.
What do you recommend for handle this type ? is SPD useless in this manner ? or should I choose different in the load transformation? Or should I avoid using SPD here?
ERROR: Error on server LIBNAME socket.
MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,814+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| 5881292| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.340000| _DISARM| 56842.870510| _DISARM| 2063800719.944049| _DISARM| 2063857562.814559| _DISARM| 35257.960000| _DISARM| | _ENDDISARM
ERROR: .
NOTE: Due to ERROR(s) above, SAS set option OBS=0, enabling syntax check mode.
This prevents execution of subsequent data modification statements.
ERROR: User asked for termination
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set XXX.
NOTE: The data set xxx has been updated. There were 769 observations rewritten, 0 observations
added and 0 observations deleted.
ERROR: Client Generic Error - SocketWrite 569
MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,824+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| 525877099| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.360000| _DISARM| 56842.886354| _DISARM| 2063800719.938458| _DISARM| 2063857562.824812| _DISARM| 35257.980000| _DISARM| | _ENDDISARM
ERROR: Connection refused.
NOTE: There were 770 observations read from the data set WORK.WG52IT.
MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,825+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| -1| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.350000| _DISARM| 56842.879640| _DISARM| 2063800719.945420| _DISARM| 2063857562.825060| _DISARM| 35257.970000| _DISARM| | _ENDDISARM
NOTE: There were 770 observations read from the data set WORK.WG52J3.
MVA_DSIO.OPEN_CLOSE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,825+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| -1| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.340000| _DISARM| 56842.877933| _DISARM| 2063800719.947373| _DISARM| 2063857562.825306| _DISARM| 35257.970000| _DISARM| | _ENDDISARM
PROCEDURE| _DISARM| STOP| _DISARM| 2025-05-26T07:46:02,825+02:00| _DISARM| BatchServer| _DISARM| SAS| _DISARM| | _DISARM| 20881408| _DISARM| 16158720| _DISARM| 10| _DISARM| 10| _DISARM| 0| _DISARM| 0| _DISARM| 35263.360000| _DISARM| 56843.000551| _DISARM| 2063800719.824903| _DISARM| 2063857562.825454| _DISARM| 35257.980000| _DISARM| | _ENDDISARM
NOTE: DATA statement used (Total process time):
real time 15:47:23.00
cpu time 9:47:43.36
I guess there are two questions here:
1: Which approach will give you the best performance.
2: How to set-up DIS to generate the desired code.
If spde is the right choice depends on how you intend to use this big table later on. spde is certainly preferable if you've got later on queries with where clauses.
I also consider spde a good choice for really big tables because you don't create these huge files that can create challenges for backup & recovery.
Assuming your delta load is update/insert only without delete:
Ideally have both your master and transaction table physically sorted by primary key.
I'm not sure which options exactly you need to select in the DIS transformation but the code it generates should look similar to below sample. When I used DIS in the past I normally just tried the options that appeared to be right and then checked in the generated code if it created what I wanted it to be - or else changed "something" and checked again until I got what I wanted.
libname spdewrk spde "%sysfunc(pathname(work))";
data spdewrk.master;
set sashelp.class;
run;
data spdewrk.transactions;
set sashelp.class;
if name='Alfred' then age=15;
if name in ('Henry','James','Jane','Janet','Mary','Philip','Robert','Ronald','Thomas') then delete;
output;
if name='Barbara' then
do;
name='New';
output;
end;
run;
/* sorts not needed if you can be sure your tables are already sorted */
proc sort data=spdewrk.master presorted;
by name;
run;
proc sort data=spdewrk.transactions;
by name;
run;
/* upsert master with transaction table */
data spdewrk.master;
modify spdewrk.master spdewrk.transactions updatemode=nomissingcheck;
by name;
if _iorc_=0 then replace;
else output;
run;
If there are also deletes:
That's something SAS is not that great at. In-place deletes in SAS tables are only logical which leads to the tables carrying more and more "dead wheight" . You only get rid of the logically deleted records if you re-create the table.
And last but not least: Define chunks for your master spde table that are greater than the default.
I guess there are two questions here:
1: Which approach will give you the best performance.
2: How to set-up DIS to generate the desired code.
If spde is the right choice depends on how you intend to use this big table later on. spde is certainly preferable if you've got later on queries with where clauses.
I also consider spde a good choice for really big tables because you don't create these huge files that can create challenges for backup & recovery.
Assuming your delta load is update/insert only without delete:
Ideally have both your master and transaction table physically sorted by primary key.
I'm not sure which options exactly you need to select in the DIS transformation but the code it generates should look similar to below sample. When I used DIS in the past I normally just tried the options that appeared to be right and then checked in the generated code if it created what I wanted it to be - or else changed "something" and checked again until I got what I wanted.
libname spdewrk spde "%sysfunc(pathname(work))";
data spdewrk.master;
set sashelp.class;
run;
data spdewrk.transactions;
set sashelp.class;
if name='Alfred' then age=15;
if name in ('Henry','James','Jane','Janet','Mary','Philip','Robert','Ronald','Thomas') then delete;
output;
if name='Barbara' then
do;
name='New';
output;
end;
run;
/* sorts not needed if you can be sure your tables are already sorted */
proc sort data=spdewrk.master presorted;
by name;
run;
proc sort data=spdewrk.transactions;
by name;
run;
/* upsert master with transaction table */
data spdewrk.master;
modify spdewrk.master spdewrk.transactions updatemode=nomissingcheck;
by name;
if _iorc_=0 then replace;
else output;
run;
If there are also deletes:
That's something SAS is not that great at. In-place deletes in SAS tables are only logical which leads to the tables carrying more and more "dead wheight" . You only get rid of the logically deleted records if you re-create the table.
And last but not least: Define chunks for your master spde table that are greater than the default.
Can you verify that you are using SPDE (you write "SPD" only), or perhaps SPDS?
Generally speaking, those engines rock when it comes to inserts, modify/update not as much.
When you have that many observations in your transaction data set, sometimes a brute force MERGE or JOIN could be quicker.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.