BookmarkSubscribeRSS Feed
alandool
Quartz | Level 8

I'm new to DS2, but learning.  We would like to use DS2 to cut down on processing time for certain processes we run.  From some simple testing, I'm finding a DS2 threaded process runs slower and takes almost twice as long than if I use PROC SQL.  This is the case even though I can see that DS2 uses several thread processes.  I find this quite puzzling since I had hoped DS2 would run faster.  Any input / comments why DS2 is running slower?   Query and logs below.

 

 

/* proc sql for comparing to ds2 thread step below -- basically copying a compressed dataset */
proc sql;
create table lib1.ds2_test_procsql as
select segment, region, state, MBR_ENRLMNT_CNT from lib1.final;

 

NOTE: Table LIB1.DS2_TEST_PROCSQL created, with 35811639 rows and 4 columns.

NOTE: PROCEDURE SQL used (Total process time):
real time 2:04.90
user cpu time 54.25 seconds
system cpu time 7.44 seconds
memory 6290.06k
OS Memory 28840.00k
Timestamp 05/17/2018 08:43:26 AM
Step Count 39 Switch Count 284
Page Faults 0
Page Reclaims 163
Page Swaps 0
Voluntary Context Switches 3487
Involuntary Context Switches 33598
Block Input Operations 0
Block Output Operations 0

 

 

/* ds2 thread steps */
proc ds2;
thread newton/overwrite=yes;
dcl double y count;
dcl bigint thisThread;
drop count y thisThread;

 

method run();
set {select segment, region, state, MBR_ENRLMNT_CNT from lib1.final};
thisThread= _threadid_;
count+1;
end;

method term();
put '**Thread' _threadid_ 'processed' count 'rows:';
end;
endthread;
run;
quit;

 

 

proc ds2;
data lib1.ds2_test_thread / overwrite=yes;
dcl thread newton frac;


method run();
set from frac threads=4;
end;
enddata;
run;
quit;


**Thread 1 processed 4086810 rows:
**Thread 0 processed 4037670 rows:
**Thread 3 processed 13699140 rows:
**Thread 2 processed 13988019 rows:
NOTE: Execution succeeded. 35811639 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):
real time 3:47.06
user cpu time 1:49.49
system cpu time 15.59 seconds
memory 5952.46k
OS Memory 28336.00k
Timestamp 05/17/2018 08:47:14 AM
Step Count 42 Switch Count 66
Page Faults 0
Page Reclaims 1620
Page Swaps 0
Voluntary Context Switches 56263
Involuntary Context Switches 64441
Block Input Operations 0
Block Output Operations 0

 

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

Several elements to answer your questions:

 

1- Is your process CPU-bound? Multi-threading only makes sense if the CPU is the bottleneck. If not, the overheard of managing and synchronising threads makes things worse.

 

2- If the disk is the bottleneck (as it usually is with SAS jobs), multi-threading makes things worse as each thread accesses the disk independently (rather than one thread accessing the disk sequentially), making the disk I/O operations more random.

 

3- For the reasons developed above, proc ds2 is mostly aimed at distributed/grid (or at least multi-path) environments.
   Using multi-path SPDE partitioned storage does not qualify, unless you can limit each thread to its own location.

   Otherwise, we are back to threads fighting for the data and killing I/O throughput.

 

4- proc ds2 is more complex and this means it incurs an overhead; so unless the conditions above are met, performance takes a hit when compared to traditional SAS programming methods.

 

At least that's my (limited) experience of DS2.

 

 

alandool
Quartz | Level 8

Thank you for your reply and information, 

 

 

 

 

 

 

ChrisNZ
Tourmaline | Level 20

The log is usually the first thing to look at to know if the processor is the bottleneck, though when the process is multi-threaded, interpretation becomes more difficult. 

I have no experience with SAS Grid (that's high on my to-do list), but I suppose it makes the log even more ... interesting ...

 

Most references of DS2 only mention in-database accelerators and CAS, so it may be that these are the main targets for DS2 rather than SAS Grid. That would be sad.

 

Here is a paper showing how DS2 can still be beneficial in a traditional environment.

http://support.sas.com/resources/papers/proceedings16/3780-2016.pdf

 

alandool
Quartz | Level 8

Hi ,

 

Thank you for your input again!

 

I think I found the issue.  The DS2 threaded program seems to run slower b/c the SAS dataset is compressed that it is reading in.  It runs faster than the proc sql program when not using a compressed dataset.  

 

Thanks for pointing me to that particular PDF.  I used a sample DS2 program within it.  Using that sample program lead me to the discovery mentioned above.

 

We often use compressed datasets, so if that is the case then DS2 will not speed up many of our processes.

 

Alan

ChrisNZ
Tourmaline | Level 20

Oh wow! Good find!

You should definitely check with tech support whether this is expected behaviour.

There is no reason compressed data sets should kill speed. That's crazy.

 

About the grid: can you see the threads running in the different nodes?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1156 views
  • 9 likes
  • 2 in conversation