12-15-2016 08:21 AM
Hi SAS users,
due to heavy run time, i was thinking of trying approch of parallell processing. But my code structure is like below and not sure whether there is a chance to do parallell process or not?
do loop i
do loop j
do loop k
macro test1 ( macro test -10 are same but with 1 parameter being diff which made them run 10 times)
do loop ends k
do loop ends j
do loop ends k
12-15-2016 08:45 AM
Before embarking on parallel processing, take a good look at why your current process performs badly. If your process is already running at saturation level of your I/O subsystem, parallelization will only make it worse.
So you have to check if you have the computing infrastructure upon which you can spread your parallel processes
- multiple CPU cores that are not yet used
- multiple I/O paths that are not yet used, or I/O capability in reserve
- memory that is not yet used
and so on.
12-15-2016 08:56 AM
UNIX System is very good and database is also fine where i connect and do 50% of the work in my code.
since the data size is 75K to 1 million records per month, i thought parallel process may help in processing macro test1 to macr test10
12-15-2016 09:06 AM - edited 12-15-2016 09:09 AM
Define "very good". I could have a system with 16 POWER cores, 512 GB of RAM and a nominal I/O throughput of 1GB/sec, and 2 silly SQLs could bring it to a standstill by riding one disk to death.
You need to identify what parts of your process take so long, and how your server(s) perform during those steps.
One approach that you could take would be this:
Suppose your outer loop performs 10 iterations.
Paramaterize those loops (ie retrieve loop_start and loop_end from commandline parameters)
Run the program in two parallel batch jobs with suiting commandline options (1 to 5 and 6 to 10) and measure your performance
If performance increases, increase parallelization until peformance stops getting better.
12-15-2016 09:04 AM
Mutitask 8 batch jobs.
My inexpensive workstation(SAS calls it PC-SAS) has 64gb and dual Xeons(3gz) and two raid 0 arrays (about $600 off lease Dell T7400)
Because the macro cascon was compute bound the 8 processes below cut the elapsed time by about factor of eight.
Eight small datasets(4 million obs) were created which you will need to either append or keep as a view after the tasks complete.
If the largest temp or perm table is less than a 1TB(Big Data) you may be able to run on a workstation otherwise I suggest
EG GRID on a SAS server. If you are doing a large simulation that requires more than 16 cores you should also consider
the EG GRID.
You need to think about mutiple SPDE(libnames) each with partitioed data if you have more I/O intensive work.
SPDE does not support mutiple tasks? Only the server addition does?
%let _s=%sysfunc(compbl(C:\Progra~1\SASHome\SASFoundation\9.4\sas.exe -sysin c:\nul -sasautos c:\oto -autoexec c:\oto\Tut_Oto.sas
options noxwait noxsync;
systask kill sys1 sys2 sys3 sys4 sys5 sys6 sys7 sys8;
systask command "&_s -termstmt %nrstr(%cascon(beg=0000001,end=0125000) -log d:\log\a1.log" taskname=sys1;
systask command "&_s -termstmt %nrstr(%cascon(beg=0125001,end=0250000) -log d:\log\a2.log" taskname=sys2;
systask command "&_s -termstmt %nrstr(%cascon(beg=0250001,end=0375000) -log d:\log\a3.log" taskname=sys3;
systask command "&_s -termstmt %nrstr(%cascon(beg=0375001,end=0500000) -log d:\log\a4.log" taskname=sys4;
systask command "&_s -termstmt %nrstr(%cascon(beg=0500001,end=0625000) -log d:\log\a5.log" taskname=sys5;
systask command "&_s -termstmt %nrstr(%cascon(beg=0625001,end=0750000) -log d:\log\a6.log" taskname=sys6;
systask command "&_s -termstmt %nrstr(%cascon(beg=0750001,end=0875000) -log d:\log\a7.log" taskname=sys7;
systask command "&_s -termstmt %nrstr(%cascon(beg=0875001,end=1000000) -log d:\log\a8.log" taskname=sys8;
waitfor sys1 sys2 sys3 sys4 sys5 sys6 sys7 sys8;
%put %sysevalf( %sysfunc(time()) - &tym);
12-15-2016 08:53 AM - edited 12-15-2016 08:55 AM
What do the macro's resolve to, that is the key question here. The code the derive to is being run once per inner loop * middle loop * outer loop, so could be many times. I find it highly unlikely that this is a good methodology of working, but without seeing it all I can't really say. Moving to parallel processing *may* help, it wont change the macro part as macro is just a text replacement facility, and if there is heavy read/write then it wont help. But is hard to advie without seeing some test data (in the form of a datastep) and the code.
Also, just re-reading your post, if your running the same macro, but with a different parameter, then a simple change to your data structure - so that each parameter is a row rather than a column, can sometimes a) reduce your coding effort, b) be far more efficient that coding each item (due to by group processing).
12-15-2016 09:13 AM
The answer is short and easy: No.
There is nothing about macro loops that creates parallel processing. SAS programs (whether macro language is involved or not) process one DATA or PROC step at a time, sequentially. There are SAS language techniques that can parallelize a single step in some cases, but introducing macro language does not bring any of those SAS language techniques into play.
If you want to post whichever macro is taking longest to run, you could probably get some suggestions on how to speed up the SAS steps within.
12-15-2016 10:18 AM
12-15-2016 10:28 AM
In that case, it becomes a matter of strategy. As others have mentioned above: