BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ChrisNZ
Tourmaline | Level 20

This got me thinking and I did a few more tests.

 

SAS gets very inefficient when processing hundreds of tables concurrently with a BY statement, and the elapse (and CPU) time grows exponentially as more tables are added.

Much better to split the processing as suggested in the first post.

 

This is even truer for the SET statement than for the merge statement. 

 

 

%*********** C R E A T E   S A M P L E **********;
%let nbobs=2000; %let nbtab=1000; %* Create sample; data %macro loop; %local i; %do i=1 %to &nbtab; HAVE&i.(rename=(J=J&i. K=K&i. L=L&i. M=M&i.) compress=no where=(I<3 or I>&i.)) %end; %mend; %loop; do I=1 to &nbobs; J=I;K=I;L=I;M=I; output; end; run;
%************ T E S T M E R G E B Y *************; %* merge 100 tables ==> 0.7 seconds; data WANT; merge HAVE1-HAVE100; by I; run; %* merge 200 tables ==> 2 seconds; data WANT; merge HAVE1-HAVE200; by I; run; %* merge 500 tables ==> 15 seconds; data WANT; merge HAVE1-HAVE500; by I; run; %* merge 1000 tables ==> 65 seconds; data WANT; merge HAVE1-HAVE1000; by I; run; %* Break merge in 200-table blocks ==> total time 10 seconds; data T1; merge HAVE1 - HAVE200; by I; run; data T2; merge HAVE201 - HAVE400; by I; run; data T3; merge HAVE401 - HAVE800; by I; run; data T4; merge HAVE601 - HAVE800; by I; run; data T5; merge HAVE801 -HAVE1000; by I; run; data T6; merge T1-T5; by I; run; %************** T E S T S E T B Y **************; %* set 50 tables ==> 4 seconds; data WANT; set HAVE1-HAVE50; by I; run; %* set 100 tables ==> 26 seconds; data WANT; set HAVE1-HAVE100; by I; run; %* set 200 tables ==> 180 seconds; data WANT; set HAVE1-HAVE200; by I; run; %* Break set in 40-table blocks ==> total time 9 seconds; data T1; set HAVE1 - HAVE40; by I; run; data T2; set HAVE49 - HAVE80; by I; run; data T3; set HAVE89 - HAVE120; by I; run; data T4; set HAVE129 - HAVE160; by I; run; data T5; set HAVE159 - HAVE200; by I; run; data T6; set T1-T5; by I; run;

 

@lingcx  You have to test with your own data in your own environment,but you should find something similar.

 

If you go the UPDATE route, splitting should speed up the prior SET step. Multiple MERGEs may still be faster though.

 

AlanC
Barite | Level 11

You could also parallel run the tasks and see if that improves performance. Here is general code to deal with it. try it in your application. The issue with this may be an I/O bound problem but worth a shot.

 

options noxwait noxsync;

%let sasexe_options = -nosplash ;

 

data _null_ ;

   file "c:\temp\test1.cmd" ;

   put "sas -sysin 'c:\temp\test1.sas' -nosplash -log 'c:\temp\test1.log'";

 

   file "c:\temp\test2.cmd" ;

   put "sas -sysin 'c:\temp\test2.sas' -nosplash -log 'c:\temp\test2.log'";

 

run;

 

 

systask kill test1 test2;

systask command "c:\temp\test1.cmd" taskname=test1;

systask command "c:\temp\test2.cmd" taskname=test2;

waitfor _all_ test1 test2;

https://github.com/savian-net

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 4161 views
  • 7 likes
  • 9 in conversation