Dear Experts,
I wonder why I can't get a time gain by using threaded processing in DS2. Below I create a dataset with a group-variable, then I count the numbers in each Group by threaded by processing in DS2. It turns out to take far longer than if I do the same with normal datastep. I had expected that the threaded processing should result in far better performance, so I wonder why this does not happen.
By the way, this example is just to illustrate the problem. If "counting observations" was the real problem there are better methods to do that.
*the test dataset:;
data test;
do group=1 to 10;
do i=1 to 1000000;
output;
end;
end;
run;
*Count observations with DS2:;
proc ds2 bypartition=no stimer;
thread read/overwrite=yes;
declare double count;
method init();
count=0;
end;
method run();
set test;
by group i;
if first.group then count=0;
count+1;
if last.group then output;
end;
endthread;
data abc/overwrite=yes;
keep group count;
declare thread read instance;
method run();
set from instance threads=4;
output;
end;
run;
quit;
NOTE: DS2 query used (Total process time):
real time 11.23 seconds
cpu time 26.59 seconds
*In comparison, an ordinary datastep:;
data abc;
set test;
by group i;
if first.group then count=0;
count+1;
run;
NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: The data set WORK.ABC has 10000000 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 6.89 seconds
cpu time 6.59 seconds
It can very a bit from one run to an other run, but basicly the same result came out each time. Also, changing bypartition to "yes" does not make any big change. And, Yes, I do have multiple processors on my server.
Parallel processing of I/O intensive tasks only makes sense if the I/O can be split unto physically separate devices.
As long as the data set in question is on one device, the threads will cause colliding requests on that device and ultimately slow the process down as compared to one single, often sequential scan through the data set.
That's why the SPDE engine works best with groups of disks aligned along the number of procs.
It would become more interesting when you would a dataset in memory using the sasfile approach.
That would eliminate IO constraints. The most slow part with all processing.
You need to have a lot of memory but that should not be an issue these days.
The next one is the overhead in starting and maintaining threads. When that overhead is high compared to the processing it self, there you have another reason you will not improve overall speed.
I have tried that also, but it doesnt help. With "SASFILE test load" before proc ds2 I get almost same result:
NOTE: DS2 query used (Total process time):
real time 7.79 seconds
cpu time 24.11 seconds
You are probably hitting the overhead starting maintain all processes.
Adding a more complicated function insyead of counting should prove that.
It is another dimension in causing load.
You have now a result the total response is almost equal but with the threading a lot of overhead is added 8 seconds finished am 25 seconds is used.
You are probably hitting the overhead starting maintain all processes.
Adding a more complicated function insyead of counting should prove that.
It is another dimension in causing load.
You have now a result the total response is almost equal but with the threading a lot of overhead is added 8 seconds finished am 25 seconds is used.
Your routine is not computationally complex enough to benefit from threading, you are really only adding overhead since the I/O is still in a single thread. If instead of a simple count you may try this example from
You are right - when the computational task is relative larger than the I/O task, then the gain by threaded processing can be huge even though I/O is not threaded.
I tried the code you suggested and I observe that the compuation (real) time decrease alot when number of threads is increased.
When 8 threads are used:
NOTE: PROCEDURE DS2 used (Total process time):
real time 2.40 seconds
cpu time 17.50 seconds
When the ordinary datastep is used:
NOTE: There were 10000000 observations read from the data set BASE.JMASTER.
NOTE: The data set WORK.JOLD has 1 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 42.09 seconds
cpu time 42.13 seconds
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.