BookmarkSubscribeRSS Feed
yabwon
Amethyst | Level 16

DS2 multi-threading is good for CPU intensive work, but data sourcing is also single threaded.

 

There is pretty nice article:

https://www.lexjansen.com/pharmasug/2019/AD/PharmaSUG-2019-AD-228.pdf

comparing DS2 and Data Step processing by Troy Martin Hughes.

 

I think that @SASJedi could give us a hand here 🙂

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



SASJedi
Ammonite | Level 13

@Ksharp I noticed that when you ran the programs in separate SAS sessions, the WHERE performed better than the subsetting IF. This is what I would have expected. WHERE processing rejects rows before loading them into the PDV. The subsetting IF rejects rows only after they are loaded into the PDV, so should always be slower than WHERE.

 

From my own test on PC SAS:

49         data w2;
50         	set have;
51         	if age=16;
52         run;

NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.W2 has 110 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time           3.07 seconds
      cpu time            1.46 seconds
_________________________________________________________________________________
49 data w1; 50 set have(where=(age=16)); 51 run; NOTE: There were 110 observations read from the data set WORK.HAVE. WHERE age=16; NOTE: The data set WORK.W1 has 110 observations and 22 variables. NOTE: DATA statement used (Total process time): real time 1.92 seconds cpu time 0.67 seconds

When DS2 is running in Base SAS, all read-write operations occur on a single thread (to ensure proper distribution of the data to the compute threads), but there can be multiple compute threads.  This means that, in base SAS, you will only see a performance gain in DS2 if your process is CPU bound. YOu can tell that a process is CPU bound if the real time and CPU time are about the same. A look our example log shows that the real time is much longer than the CPU time - so this process is I/O bound, not CPU bound, and will not benefit from multi-threading in base SAS. Now, if you have access to a MPP environment that can run DS2 (like CAS, or one of the databases that support the in-database code accelerator) then you can get both parallel read-wrote and parallel compute, and DS2 should give you a nice boost over single-threaded DATA steps. 

 

DS2 does not include a WHERE statement because it supports reading directly from a FedSQL query on the SET statement. For example:

proc ds2;
	thread th/overwrite=yes;
		method run();
			set {select * from have where age=16};
		end;
	endthread;
run;

	data w2/overwrite=yes;
		declare thread th th1;
		method run();
			set from th1 threads=8;
		end;
	enddata;
run;
quit;

When I ran this, it took just about the same amount of time as the DATA step with the WHERE statement:

NOTE: PROCEDURE DS2 used (Total process time):
      real time           1.79 seconds
      cpu time            0.84 seconds

You can see tips like this in my SAS Tutorial | 5 Ways to Make Your SAS Code Run Faster - it's short and sweet and shows examples. 

Hope this helps.

Check out my Jedi SAS Tricks for SAS Users
Ksharp
Super User
Jedi,

"when you ran the programs in separate SAS sessions, the WHERE performed better than the subsetting IF. "
Maybe you are right. But sometimes I noticed that IF is a little faster than WHERE , I don't know why ,maybe the environment around SAS installed is a key reason.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 2990 views
  • 21 likes
  • 8 in conversation