BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Eileen1496
Obsidian | Level 7

Hi all,

 

I would like to use ds2 to conduct multi-thread processing. I have a simple data step ( I both this and my attempt for ds2 down), however, where can i find materials telling me how to adapt to ds2? I found some youtube video, but they gave the code directly without explainning why after changing ds2 we add so many new lines of code. Also I did not find the one with data step

/*data step*/
data temp_e.uid10_ind3_f25;
set temp_e.ind3_f25; 
where substr(user_id, 1, 2) = '10';
end;
/* my attempt */
proc ds2;
data temp_e.uid10_ind3_f25;
  dcl str user_id;
  method run();
     set temp_e.ind3_f25; 
     where substr(user_id, 1, 2) = '10';
  end;
enddata;
run;
quit;
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Using option fullstimer is a good place to start: https://support.sas.com/rnd/scalability/tools/fullstim/index.html 

 

SQL needs often a lot of internal sorting which in turn often creates a lot of I/O operations for utility files. 

Proc SQL options _method_ and _tree_ write to the SAS log info that give you more visibility what's actually happening. 

 

For joins of "small" tables (=fit into memory) with large tables using a data step with a hash table for the lookup can often increase performance a lot because it doesn't require any sorting of the big table.

 

SAS/Connect (rsubmit) allows for parallel processing on multiple CPUs via multiple SAS sessions - but this requires a significant amount of additional coding and starting SAS sessions takes time. It's only worth doing in rather rare cases like for running a whole process multiple times in parallel with different parameters.

View solution in original post

7 REPLIES 7
Patrick
Opal | Level 21

You can find documentation here:

There are also books out there.

 

I haven't used DS2 much myself but I believe it's not just running a process multithreaded when reading/writing *.sas7bdat files.

 

Below some working DS2 code similar to your sample.

data work.class_in;
  set sashelp.class;
  do i=1 to 100000;
    output;
  end;
run;

proc datasets lib=work nolist nowarn;
  delete class_out;
quit;

options fullstimer;
proc ds2;
  data work.class_out;
    declare double threadid;
    method run();
      set work.class_in;
      threadid=_threadid_;
    end;
  enddata;
  run;
quit;

proc freq data=work.class_out;
  table threadid;
run;
Eileen1496
Obsidian | Level 7

HI Patrick,

 

I will look into these resources you recommend, thanks!

SASKiwi
PROC Star

Multi-threading is really only useful if your SAS processes are CPU-bound. FYI PROC SQL is multi-threaded by default so if your use case can be done in SQL you may find that is a better option and it doesn't require any special coding.

 

If your SAS processes are IO-bound, then parallel-processing is a better option.

Eileen1496
Obsidian | Level 7

How can I check if it is IO bound or CPU bound? currently i add threads to proc sql, and set options cpucounts = 40, I do find the procedure is much faster. However, in resources manager, it says threads=40, cpu = 1, so im only using one CPU, but my CPU usage is just 3%. Maybe because my code does not need so much CPU so they did not allocate them in multiple cpu?

Im not sure how to see if it is IO bound. Can you point to me where?

 

Patrick
Opal | Level 21

Using option fullstimer is a good place to start: https://support.sas.com/rnd/scalability/tools/fullstim/index.html 

 

SQL needs often a lot of internal sorting which in turn often creates a lot of I/O operations for utility files. 

Proc SQL options _method_ and _tree_ write to the SAS log info that give you more visibility what's actually happening. 

 

For joins of "small" tables (=fit into memory) with large tables using a data step with a hash table for the lookup can often increase performance a lot because it doesn't require any sorting of the big table.

 

SAS/Connect (rsubmit) allows for parallel processing on multiple CPUs via multiple SAS sessions - but this requires a significant amount of additional coding and starting SAS sessions takes time. It's only worth doing in rather rare cases like for running a whole process multiple times in parallel with different parameters.

SASKiwi
PROC Star

Processes are usually IO-bound if the SAS log real time is much larger than the CPU time and the data volumes (row and column counts) are large. If CPU time is larger than real time then that indicates multi-threading is happening. 

Eileen1496
Obsidian | Level 7

These are really helpful (and for me easy to understand check)! Thanks!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 567 views
  • 1 like
  • 3 in conversation