DS2 multi-threading is good for CPU intensive work, but data sourcing is also single threaded.
There is pretty nice article:
https://www.lexjansen.com/pharmasug/2019/AD/PharmaSUG-2019-AD-228.pdf
comparing DS2 and Data Step processing by Troy Martin Hughes.
I think that @SASJedi could give us a hand here 🙂
Bart
@Ksharp I noticed that when you ran the programs in separate SAS sessions, the WHERE performed better than the subsetting IF. This is what I would have expected. WHERE processing rejects rows before loading them into the PDV. The subsetting IF rejects rows only after they are loaded into the PDV, so should always be slower than WHERE.
From my own test on PC SAS:
49 data w2; 50 set have; 51 if age=16; 52 run; NOTE: There were 10000000 observations read from the data set WORK.HAVE. NOTE: The data set WORK.W2 has 110 observations and 22 variables. NOTE: DATA statement used (Total process time): real time 3.07 seconds cpu time 1.46 seconds _________________________________________________________________________________
49 data w1; 50 set have(where=(age=16)); 51 run; NOTE: There were 110 observations read from the data set WORK.HAVE. WHERE age=16; NOTE: The data set WORK.W1 has 110 observations and 22 variables. NOTE: DATA statement used (Total process time): real time 1.92 seconds cpu time 0.67 seconds
When DS2 is running in Base SAS, all read-write operations occur on a single thread (to ensure proper distribution of the data to the compute threads), but there can be multiple compute threads. This means that, in base SAS, you will only see a performance gain in DS2 if your process is CPU bound. YOu can tell that a process is CPU bound if the real time and CPU time are about the same. A look our example log shows that the real time is much longer than the CPU time - so this process is I/O bound, not CPU bound, and will not benefit from multi-threading in base SAS. Now, if you have access to a MPP environment that can run DS2 (like CAS, or one of the databases that support the in-database code accelerator) then you can get both parallel read-wrote and parallel compute, and DS2 should give you a nice boost over single-threaded DATA steps.
DS2 does not include a WHERE statement because it supports reading directly from a FedSQL query on the SET statement. For example:
proc ds2;
thread th/overwrite=yes;
method run();
set {select * from have where age=16};
end;
endthread;
run;
data w2/overwrite=yes;
declare thread th th1;
method run();
set from th1 threads=8;
end;
enddata;
run;
quit;
When I ran this, it took just about the same amount of time as the DATA step with the WHERE statement:
NOTE: PROCEDURE DS2 used (Total process time): real time 1.79 seconds cpu time 0.84 seconds
You can see tips like this in my SAS Tutorial | 5 Ways to Make Your SAS Code Run Faster - it's short and sweet and shows examples.
Hope this helps.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.