BookmarkSubscribeRSS Feed
SASAna
Quartz | Level 8

Hi SAS users,

 

due to heavy run time, i was thinking of trying approch of parallell processing. But my code structure is like below and not sure whether there is a chance to do parallell process or not?

 

 

do loop i  

do loop j  

do loop k

 

macro test1     ( macro test -10 are same but with 1 parameter being diff which made them run 10 times)

macro  test2

macro test3

..

..

macro test10

 

do loop ends   k

do loop ends   j

do loop ends  k

 

Thanks,

Ana

8 REPLIES 8
Kurt_Bremser
Super User

Before embarking on parallel processing, take a good look at why your current process performs badly. If your process is already running at saturation level of your I/O subsystem, parallelization will only make it worse.

So you have to check if you have the computing infrastructure upon which you can spread your parallel processes

- multiple CPU cores that are not yet used

- multiple I/O paths that are not yet used, or I/O capability in reserve

- memory that is not yet used

and so on.

 

SASAna
Quartz | Level 8

Hi Kurt,

 

UNIX System is very good and database is also fine where i connect and do 50%  of the work in my code.

 

since the data size is 75K to 1 million records per month, i thought parallel process may help in processing macro test1 to macr test10

Kurt_Bremser
Super User

Define "very good". I could have a system with 16 POWER cores, 512 GB of RAM and a nominal I/O throughput of 1GB/sec, and 2 silly SQLs could bring it to a standstill by riding one disk to death.

You need to identify what parts of your process take so long, and how your server(s) perform during those steps.

 

One approach that you could take would be this:

 

Suppose your outer loop performs 10 iterations.

Paramaterize those loops (ie retrieve loop_start and loop_end from commandline parameters)

Run the program in two parallel batch jobs with suiting commandline options (1 to 5 and 6 to 10) and measure your performance

If performance increases, increase parallelization until peformance stops getting better.

rogerjdeangelis
Barite | Level 11

Mutitask 8 batch jobs.

 

My inexpensive workstation(SAS calls it PC-SAS) has 64gb and dual Xeons(3gz) and two raid 0 arrays (about $600 off lease Dell T7400)

 

Because the macro cascon was compute bound the 8 processes below cut the elapsed time by about factor of eight.

Eight small datasets(4 million obs) were created which you will need to either append or keep as a view after the tasks complete. 

 

If the largest temp or perm table is less than a 1TB(Big Data)  you may be able to run on a workstation otherwise I suggest 

EG GRID on a SAS server. If you are doing a large simulation that requires more than 16 cores you should also consider

the EG GRID.

 

You need to think about mutiple SPDE(libnames)  each with partitioed data if you have more I/O intensive work.

SPDE does not support mutiple tasks? Only the server addition does?


%let _s=%sysfunc(compbl(C:\Progra~1\SASHome\SASFoundation\9.4\sas.exe -sysin c:\nul -sasautos c:\oto -autoexec c:\oto\Tut_Oto.sas
-work d:\wrk));


options noxwait noxsync;
%let tym=%sysfunc(time());
systask kill sys1 sys2 sys3 sys4 sys5 sys6 sys7 sys8;
systask command "&_s -termstmt %nrstr(%cascon(beg=0000001,end=0125000);) -log d:\log\a1.log" taskname=sys1;
systask command "&_s -termstmt %nrstr(%cascon(beg=0125001,end=0250000);) -log d:\log\a2.log" taskname=sys2;
systask command "&_s -termstmt %nrstr(%cascon(beg=0250001,end=0375000);) -log d:\log\a3.log" taskname=sys3;
systask command "&_s -termstmt %nrstr(%cascon(beg=0375001,end=0500000);) -log d:\log\a4.log" taskname=sys4;
systask command "&_s -termstmt %nrstr(%cascon(beg=0500001,end=0625000);) -log d:\log\a5.log" taskname=sys5;
systask command "&_s -termstmt %nrstr(%cascon(beg=0625001,end=0750000);) -log d:\log\a6.log" taskname=sys6;
systask command "&_s -termstmt %nrstr(%cascon(beg=0750001,end=0875000);) -log d:\log\a7.log" taskname=sys7;
systask command "&_s -termstmt %nrstr(%cascon(beg=0875001,end=1000000);) -log d:\log\a8.log" taskname=sys8;
waitfor sys1 sys2 sys3 sys4 sys5 sys6 sys7 sys8;
%put %sysevalf( %sysfunc(time()) - &tym);

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

What do the macro's resolve to, that is the key question here.  The code the derive to is being run once per inner loop * middle loop * outer loop, so could be many times.  I find it highly unlikely that this is a good methodology of working, but without seeing it all I can't really say.  Moving to parallel processing *may* help, it wont change the macro part as macro is just a text replacement facility, and if there is heavy read/write then it wont help.  But is hard to advie without seeing some test data (in the form of a datastep) and the code.

 

Also, just re-reading your post, if your running the same macro, but with a different parameter, then a simple change to your data structure - so that each parameter is a row rather than a column, can sometimes a) reduce your coding effort, b) be far more efficient that coding each item (due to by group processing).

Astounding
PROC Star

The answer is short and easy:  No.

 

There is nothing about macro loops that creates parallel processing.  SAS programs (whether macro language is involved or not) process one DATA or PROC step at a time, sequentially.  There are SAS language techniques that can parallelize a single step in some cases, but introducing macro language does not bring any of those SAS language techniques into play.

 

If you want to post whichever macro is taking longest to run, you could probably get some suggestions on how to speed up the SAS steps within.

SASAna
Quartz | Level 8
Thanks Asto, Macro is pretty simple with DB connectivity, insert, update & delete SQL's only. But data is huge. running 7 loops is taking extra hours.
Astounding
PROC Star

In that case, it becomes a matter of strategy.  As others have mentioned above:

 

  • Consider whether several steps could logically be combined into one.  (How to adjust the program to combine those steps is a secondary issue.)
  • Consider splitting up the job into several jobs.  SAS will run several jobs in parallel, although there may be contention for either processing power or for access to the database that is being updated.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1886 views
  • 0 likes
  • 5 in conversation