BookmarkSubscribeRSS Feed
ScottBass
Rhodochrosite | Level 12

I'm trying to wrap my head around EG's support for parallel processing.  I have a large number of long running, independent program entries that could benefit from parallel processing, then combining them in downstream code.

 

There are (at least) two approaches:

 

1) The project property is serial processing (the default), and I mark specific program entries to run in parallel on the "same" server (even though it runs on a different server instance).

2) The project property is parallel processing, and I mark specific program entries to run in serial.  This is how I've configured the current project I'm working on.

 

(It sure would be nice if I could set this property at the process flow level and have the program entries inherit this setting.  Setting this property entry by entry is a real pain).

 

I've also learned the importance of setting links for the program entries, so they honor dependencies (run in serial) even if the program entry is marked to run in parallel.

 

If you've done parallel processing in EG, can you suggest any best practice?  Did you find it worked as you expected?  Any "gotchas"?

 

I admit I haven't read the EG documentation cover to cover on this, but if you're aware of any good SGF papers that explain EG's parallel processing please point me in the right direction.

 

Finally, I've encountered this behavior which I find "unintuitive":

 

* Mark your project property to run in parallel by default

* Create this program entry:

 

data shoes;
   set sashelp.shoes;
run;

You get your 500+ lines of macro code in the log (MACRO: defineworkfolder, etc), it runs on a separate server instance, and you don't get any local results.  This is expected.  If I wrote the output dataset to a permanent library, I could do something with it in downstream code.

 

* But now create a 2nd program entry and mark it serial processing (custom, don't tick any checkboxes).

* Run the same code as above.  You now have work.shoes in your local session, and it display in the datatable output (if you have default settings).  So far so good.

* Now do Filter and Sort, all variables, filter=Region='United States'

* The code runs, the log looks "normal" with no macro code in the log.  But there is no local WORK.FILTER_FOR_SHOES dataset, and no datatable output.

 

So, I have a local table, run a Filter and Sort task on it, from a program entry marked for serial processing, with no parallel processing output lines in the log, but it ran in parallel.

 

Does anyone else consider this a design bug?

 

I'm creating EG projects that less experienced end users will be running.  They may run partial code from a program entry marked for serial processing, then want to explore the data. 

 

Given the above behavior, I don't think I can ever run with with option #2 above - I think it will freak out my end users.

 

Thoughts?

 

Thanks...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
1 REPLY 1
tomrvincent
Rhodochrosite | Level 12

Lots of 'gotcha's to setup:

 

1. save your globals and reset them in each task (use tools/options/sas programs)

2. save your libnames and reset them in each task.

3. tasks don't work.

4. if you want to use tasks, have a second EG session with parallel turned off.  You can then use your libnames from the 1st session and do tasks and such.  Good for dev/testing.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 1237 views
  • 1 like
  • 2 in conversation