BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ChrisNZ
Tourmaline | Level 20

This got me thinking and I did a few more tests.

 

SAS gets very inefficient when processing hundreds of tables concurrently with a BY statement, and the elapse (and CPU) time grows exponentially as more tables are added.

Much better to split the processing as suggested in the first post.

 

This is even truer for the SET statement than for the merge statement. 

 

 

%*********** C R E A T E   S A M P L E **********;
%let nbobs=2000; %let nbtab=1000; %* Create sample; data %macro loop; %local i; %do i=1 %to &nbtab; HAVE&i.(rename=(J=J&i. K=K&i. L=L&i. M=M&i.) compress=no where=(I<3 or I>&i.)) %end; %mend; %loop; do I=1 to &nbobs; J=I;K=I;L=I;M=I; output; end; run;
%************ T E S T M E R G E B Y *************; %* merge 100 tables ==> 0.7 seconds; data WANT; merge HAVE1-HAVE100; by I; run; %* merge 200 tables ==> 2 seconds; data WANT; merge HAVE1-HAVE200; by I; run; %* merge 500 tables ==> 15 seconds; data WANT; merge HAVE1-HAVE500; by I; run; %* merge 1000 tables ==> 65 seconds; data WANT; merge HAVE1-HAVE1000; by I; run; %* Break merge in 200-table blocks ==> total time 10 seconds; data T1; merge HAVE1 - HAVE200; by I; run; data T2; merge HAVE201 - HAVE400; by I; run; data T3; merge HAVE401 - HAVE800; by I; run; data T4; merge HAVE601 - HAVE800; by I; run; data T5; merge HAVE801 -HAVE1000; by I; run; data T6; merge T1-T5; by I; run; %************** T E S T S E T B Y **************; %* set 50 tables ==> 4 seconds; data WANT; set HAVE1-HAVE50; by I; run; %* set 100 tables ==> 26 seconds; data WANT; set HAVE1-HAVE100; by I; run; %* set 200 tables ==> 180 seconds; data WANT; set HAVE1-HAVE200; by I; run; %* Break set in 40-table blocks ==> total time 9 seconds; data T1; set HAVE1 - HAVE40; by I; run; data T2; set HAVE49 - HAVE80; by I; run; data T3; set HAVE89 - HAVE120; by I; run; data T4; set HAVE129 - HAVE160; by I; run; data T5; set HAVE159 - HAVE200; by I; run; data T6; set T1-T5; by I; run;

 

@lingcx  You have to test with your own data in your own environment,but you should find something similar.

 

If you go the UPDATE route, splitting should speed up the prior SET step. Multiple MERGEs may still be faster though.

 

AlanC
Barite | Level 11

You could also parallel run the tasks and see if that improves performance. Here is general code to deal with it. try it in your application. The issue with this may be an I/O bound problem but worth a shot.

 

options noxwait noxsync;

%let sasexe_options = -nosplash ;

 

data _null_ ;

   file "c:\temp\test1.cmd" ;

   put "sas -sysin 'c:\temp\test1.sas' -nosplash -log 'c:\temp\test1.log'";

 

   file "c:\temp\test2.cmd" ;

   put "sas -sysin 'c:\temp\test2.sas' -nosplash -log 'c:\temp\test2.log'";

 

run;

 

 

systask kill test1 test2;

systask command "c:\temp\test1.cmd" taskname=test1;

systask command "c:\temp\test2.cmd" taskname=test2;

waitfor _all_ test1 test2;

https://github.com/savian-net

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 16 replies
  • 5102 views
  • 7 likes
  • 9 in conversation