This got me thinking and I did a few more tests.
SAS gets very inefficient when processing hundreds of tables concurrently with a BY statement, and the elapse (and CPU) time grows exponentially as more tables are added.
Much better to split the processing as suggested in the first post.
This is even truer for the SET statement than for the merge statement.
%*********** C R E A T E S A M P L E **********;
%let nbobs=2000;
%let nbtab=1000;
%* Create sample;
data %macro loop; %local i; %do i=1 %to &nbtab;
HAVE&i.(rename=(J=J&i. K=K&i. L=L&i. M=M&i.) compress=no where=(I<3 or I>&i.))
%end; %mend;
%loop;
do I=1 to &nbobs;
J=I;K=I;L=I;M=I;
output;
end;
run;
%************ T E S T M E R G E B Y *************;
%* merge 100 tables ==> 0.7 seconds;
data WANT;
merge HAVE1-HAVE100;
by I;
run;
%* merge 200 tables ==> 2 seconds;
data WANT;
merge HAVE1-HAVE200;
by I;
run;
%* merge 500 tables ==> 15 seconds;
data WANT;
merge HAVE1-HAVE500;
by I;
run;
%* merge 1000 tables ==> 65 seconds;
data WANT;
merge HAVE1-HAVE1000;
by I;
run;
%* Break merge in 200-table blocks ==> total time 10 seconds;
data T1; merge HAVE1 - HAVE200; by I; run;
data T2; merge HAVE201 - HAVE400; by I; run;
data T3; merge HAVE401 - HAVE800; by I; run;
data T4; merge HAVE601 - HAVE800; by I; run;
data T5; merge HAVE801 -HAVE1000; by I; run;
data T6; merge T1-T5; by I; run;
%************** T E S T S E T B Y **************;
%* set 50 tables ==> 4 seconds;
data WANT;
set HAVE1-HAVE50;
by I;
run;
%* set 100 tables ==> 26 seconds;
data WANT;
set HAVE1-HAVE100;
by I;
run;
%* set 200 tables ==> 180 seconds;
data WANT;
set HAVE1-HAVE200;
by I;
run;
%* Break set in 40-table blocks ==> total time 9 seconds;
data T1; set HAVE1 - HAVE40; by I; run;
data T2; set HAVE49 - HAVE80; by I; run;
data T3; set HAVE89 - HAVE120; by I; run;
data T4; set HAVE129 - HAVE160; by I; run;
data T5; set HAVE159 - HAVE200; by I; run;
data T6; set T1-T5; by I; run;
@lingcx You have to test with your own data in your own environment,but you should find something similar.
If you go the UPDATE route, splitting should speed up the prior SET step. Multiple MERGEs may still be faster though.
You could also parallel run the tasks and see if that improves performance. Here is general code to deal with it. try it in your application. The issue with this may be an I/O bound problem but worth a shot.
options noxwait noxsync;
%let sasexe_options = -nosplash ;
data _null_ ;
file "c:\temp\test1.cmd" ;
put "sas -sysin 'c:\temp\test1.sas' -nosplash -log 'c:\temp\test1.log'";
file "c:\temp\test2.cmd" ;
put "sas -sysin 'c:\temp\test2.sas' -nosplash -log 'c:\temp\test2.log'";
run;
systask kill test1 test2;
systask command "c:\temp\test1.cmd" taskname=test1;
systask command "c:\temp\test2.cmd" taskname=test2;
waitfor _all_ test1 test2;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.