BookmarkSubscribeRSS Feed
prad001
Obsidian | Level 7

Hi All.,

I am trying to find out if there is a way to process the do loop in parallel.

 

Example:

 

data ds1;

  set sample(where=(a=1));

run;

%do i = 1 %to 100;

  data ds2_&i;

    set ds1(firstobs=&i obs=&i);

  run;

%end;

 

 

Instead of running one loop after another., I want the data step (ds2_&i) to run in parallel and process 100 datasteps at the same time.

Please let me the best possible way to run this in parallel. 

 

Thank you and appreciate your help. 

 

18 REPLIES 18
SASKiwi
PROC Star

To run a 100 DATA steps at the same time would require 100 separate SAS sessions. This is not a realistic possibility. It would help if you explained the processing problem you have rather than just describing the solution you want.

prad001
Obsidian | Level 7

I meant not to create 100 SAS sessions but can at least divide it by 5 independent sessions (example: session1 - loops 1-20, session2 loops 21-40.... so on). 

 

We are trying to find a way to run our loops in parallel because of the efficiency purpose.  

SASKiwi
PROC Star

Do you have SAS/CONNECT available to you? Creating 5 parallel sessions is very easy if you have this product. You can check what products you have by running this:

proc product_status;
run;

proc setinit;
run;
prad001
Obsidian | Level 7
Unfortunately, I do not have SAS/CONNECT.
SASKiwi
PROC Star

Assuming you are using a remote SAS server, do you have the ability to execute OS commands from your SAS sessions? You can check this by running this:

proc options option = xcmd;
run;
prad001
Obsidian | Level 7

Yes, X command is enabled. And I am using SAS on Linux.

SASKiwi
PROC Star

Cool, that means you can "shell" a new SAS session to run the parallel processes like this:

x 'sas MySASProgram.sas';

I don't have the time to dig out an actual parallel example right now but you can search the Communities using the words "parallel processing" as there have been plenty of posts on this. 

Kurt_Bremser
Super User

On UNIX systems, you can run multiple programs in parallel from a shell script.

sas program1.sas&
sas program2.sas&
sas program3.sas&
sas program4.sas&
sas program5.sas&
wait

The & sends the execution to the background, and wait will only return when all background processes have finished. 

prad001
Obsidian | Level 7

Thank you for the response Kurt.

 

Why do I have to split the program.

My intention was to run the loop in to 100 new SAS sessions. How can I do I create 100 sessions for a given example using SYSTASK ? 

 

And yes, I also read you another comment regarding increasing parallel processing may decrease in the performance. I am trying to figure out the optimal number of parallel processing. Not 100 ofcourse. 

 

 

 

Kurt_Bremser
Super User

Also note that parallel processing will only improve performance if you have enough CPU cores, and the storage to handle the I/O. As soon as you reach those limits, further parallelization will in fact lead to substantially worse overall performance.

ChrisNZ
Tourmaline | Level 20

What @Kurt_Bremser said: parallel means slower if the hardware is not up to it.

Also, I question your need to split the table like this. Why do you need to do this?

 

mkeintz
PROC Star

Why do you need to create 100 datasets single-observations datasets?  Why not make one dataset with 100 observations, and a new variable, say I_GROUP, corresponding to the DS2_&i dataset names?

 

data ds2_all;
  set ds1 (where=(a=1) obs=100);
  i_group=_n_;
run;

Note that the where filter is outsourced to the data set engine, prior to delivering observations to the data step.  Therefore the _N_ variable (i.e. the iteration number of the data step) is counting only the a=1 observations.  So i_group is equivalent to your DS2_&i dataset names.

 

 

And like @ChrisNZ,  I don't see any evident need to do parallel processing, especially for this particular task.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
s_lassen
Meteorite | Level 14

Take a look at the SYSTASK statement (for Linux here: SAS Help Center: SYSTASK Statement: UNIX ) which enables you to start several processes in parallel, wait for them all to finish, and get the results for each process.

Patrick
Opal | Level 21

Given that you haven't got SAS/Connect but only xcmd available any parallelization will quickly add a lot of complexity to your program. 

I would first investigate if you can't performance optimize your current code without parallelization before going for such an approach.

If you share more detail what you really have then people here might be able to provide some guidance.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1904 views
  • 7 likes
  • 7 in conversation