Re: Allow Parallel Execution on the Same Server?

SNLFAM1 · Posted 08-19-2014 12:56 PM

I have been experimenting with the "Allow parallel execution on the same server" option in Enterprise Guide 5.1 (What was the first version that contained this option?). It seems to me that there should be a way to automatically generate and store the path of the WORK directory that is created when EG connects to the workspace server the first time so that you can call it automatically in code that is running in parallel. Is this possible?

Would I then (using prompts?) be able to insert this stored path in a libname statement so that certain parallel tasks and programs can read and write to the same work directory? I would like all of this to be done in an automatic and repeatable way. I have been trying to use the "Insert SAS Code.." and "Submit SAS Code.." options in EG to make this an automatic and the default method of submitting process flows. I have even considered altering the default output library (for tasks only?) so that all of this happens without having to alter existing EG process flows too much.

Any thoughts or suggestions on a way that I could take full advantage of the capabilities of the "Allow Parallel Execution on the Same Server" but store the data all in the same temporary location?Any good papers or documentation on this option that I have yet to find?

Notes:

- We do not have SAS/CONNECT which would seem to resolve this issue somewhat.

- We also do not have a grid enabled environment, but that may be coming in the future.

- I also do not want all of these tables to be stored in a "permanent" location which is why I want to send the data to a "shared" WORK directory.

- I am well aware of the potential consequences of allowing users to run MANY SAS workspace sessions with the click of a button, but our user base is small and I am confident that with proper training and monitoring we can take advantage of these features without undue stress on our server.

- We are moving to EG 6.1 soon if that version treats this differently.

** It seems that this would be a useful feature (with proper role permissions) to program as part of EG out-of-the-box functionality to be able to use this option seamlessly in the way I describe for future versions of EG. Any word of this feature being worked on by developers? And since I am wishing, any chance EG could write all of this out to a stored process that takes advantage of parallel processing using multiple "sessions"?

jakarman · Posted 08-19-2014 03:03 PM

EG 5.1 is the first version supporting parallel executing. You can set that parallel execution option at the flow and the node level.

The most easy way is setting it at flow level. This is coming with a disadvantage as parallel nodes do not update the master/Eguide node.

It doe not update library allocations done I the node nor macro vars or other settings. Makes sense as a property of asynchronous processing (parallel gird threading). This note is hidden in the online help of Eguide. You only can find knowing what to look for.

What you can do is:

- define a basic-start node. This node should not be run in parallel (change that option back)

- Define you own libarie is this that should get shared. Be ware that using the same dataset will cause locking problems.

Do not use the saswork for that all sessions are already creating a nested one in some master session.

You can see that as a repetition of the physical filename in work of the EGuide session.

Wanting something with that name you clould use %sysfunc(pathname(work))

- make flow dependicies from this job to the others as needed

- limiting the maximum number of session parallel is possible it is a EGuide profile setting.

With the Installation of EGuide you can set default properties. One of them is the limit of concurrent maximum threads. Choose eg 6.

http://support.sas.com/resources/papers/proceedings12/297-2012.pdf (pag 16)

51225 - How to set the number of parallel processes that SAS® Enterprise Guide® can execute

---->-- ja karman --<-----

SNLFAM1 · Posted 08-21-2014 03:06 PM

Thank you Jaap. As usual, your comments were helpful. I will be especially mindful of the limit on the number of parallel processes.

I have come up with a partial solution by taking the following steps:

Step 1: Turn on "Allow parallel execution on the same server" at the project level.

A) Use the EG menu to go to FILE > Project Properties
B) Select the section "Code Submission"
C) Check the box next to "Allow parallel execution on the same server"
D) Save the changes

Step 2: Store the "master" work session path somewhere fixed to be read in by other parallel processes

A) Create a "program" node in EG that reads in the WORK directory path and writes it out to a 'fixed' location
1. a. I used the following code to create a user and project unique location for storing the path as a libname statement in a .sas file to be read in with a %include:

%let projectdir=%sysfunc(compress(&_clientprojectpath,,kn));

data _null_;

FILE "/sas/data/g_research/&sysuserid/paths/&projectdir..sas";

a="libname workshr '%sysfunc(pathname(work))';";

PUT a;

run;

b. Caution: You want to be sure that this file containing the path isn’t accidentally overwritten in the middle of your process flow. To accomplish this, I used automatic/system macro variables generated by EG to create something that is unique to this project, but common for all SAS workspace sessions spawned by this project. An alternative to using macros for the path is to simply have a fixed path, but this runs a higher risk of being overwritten by another project using the same path. I am trying to make this as portable as possible to be able to use in many different EG projects.
B) Change this program node so that it does not "Allow parallel execution on the same server"
1. a. Go to the properties of the program node
2. b. Select the section named “Code Submission”
3. c. Click the bullet for the option “Customize code submission options”
4. d. Make sure "Allow parallel execution on the same server" is NOT checked
5. e. Save the changes
C) Run this program node before running the rest of your project
1. a. Note: Because this program node has had the "Allow parallel execution on the same server" turned off
2. b. To automate things a little bit, I created an Autoexec process flow with this code node in it (Good info on Autoexec process flows: You asked for it: the Autoexec process flow - The SAS Dummy)

Step 3: Use this path to store any datasets that are used by other branches/nodes of your EG project

A) Read in the path from the fixed location
1. a. I used the following statements to call the code I had saved previously:

%let projectdir=%sysfunc(compress(&_clientprojectpath,,kn));

%include "/sas/data/g_research/&sysuserid/paths/&projectdir..sas";

b. To automate things a bit more, I put these two lines of code in the following to option locations in EG
- “Insert custom SAS code before task and query code”
- “Insert custom SAS code before submitted code”
- Note: This will cause the libname to be assigned prior to every task or node in the process flow. A little bit of overkill, but the code is small and the assignment is temporary.
B) Assign a Library to this path
1. a. By using a %include referencing code that contains a libname statement, I have already accomplished this. If another method is used, you may need to have this be a separate statement that is run for every task and node in the EG process flow.
C) Use the library to read and write any datasets that need to be in common.
1. a. I tested this with many program nodes that all looked like the following:

data workshr.temp2;

set sashelp.cars;

run;

b. I also tested this with a Query Builder task which also worked
- Note: To automate the libname assignment some, I chose the “workshr” directory as the first “Default library for Output Data” option in EG which seems to apply to all tasks, but not programs.

It seems a little contrived, but I chose the method that I did so that I have it set up and available to any EG project that I run. It also doesn’t require any changes for a project that that does not use "Allow parallel execution on the same server" option (i.e. unchecked).

Some drawbacks that I would like to find solutions to:

1. I created a “/path/” folder for all of the library paths to be stored in. This “/path/” directory would have to be added to any users home directories who use my project. This can be fixed with a more common or shared path… but that would mean more possibilities of corrupting that filing during process flow running.
2. Anyone else who used my project would have to add a %include statement to their “Insert Custom SAS Code…” options unless I put the %include in the program nodes themselves.
3. Turning the option "Allow parallel execution on the same server" on and off for the same project causes some weird behavior that looks like errors, but seems to still run correctly.
4. I believe that this only works for datasets and that options and macro variables that are assigned in any of these tasks or program nodes would not be able to be used in any other task or program node. Anyone no how to change the storage location of these other values?
5. If you are not careful, you could have locking issues or precedence issues where you ty to read a dataset that has not been created yet.
6. In order to apply this to existing EG projects, you would have to change the libraries for nearly every step to the shared work library. Has anyone come up with a good way to redirect the WORK library in the middle of running code?
7. I am sure there are others…. But I can’t think of them right now.

Any suggestions on how I could improve this?

jakarman · Posted 08-22-2014 08:50 AM

Wow that is an update.

Adding some figures and some explanations it would almost ready for an paper as presentation.

The drawback questions.

1 It something of a central platform admin role to do preventing unwanted changes. For users read-only

2 That are choices.

3 ?? for SAS TS ?. It should not behave weird. Turning on/off during running not something likely to be usual.

4 I have seen macro-variables (filenames) seen behaving the same way.

Proc optsave / proc optload can be used for saving restoring options. Base SAS(R) 9.3 Procedures Guide, Second Edition

There are also ways to do that with a getoption approach SAS(R) 9.3 System Options: Reference, Second Edition

For macros there must be something like that, when not it is not too difficult to build.

An new proc is coming Presenv Base SAS(R) 9.4 Procedures Guide, Third Edition

5 Yep, that is parallel processing.

6 The work option is not to be changed in a running session. SAS(R) 9.4 Companion for UNIX Environments, Fourth Edition

If you want the default location, one level table names, to another location, That is the User option SAS(R) 9.4 Companion for UNIX Environments, Fourth Edition an can be changed as an option

I think you are looking for this one.

7 Creating directories with new names. SAS(R) 9.4 System Options: Reference, Third Edition DLCreatedir.

8 _clientprojectpath it is a nice variable but not always trustworthy. When possible choose something different?

9 Creating directories

10 more?

---->-- ja karman --<-----

Allow Parallel Execution on the Same Server?