Understanding of code submission to Grid nodes

Babloo · Posted 02-04-2019 06:39 AM

Assume I've 15 Grid compute nodes and I'm executing a 1000 lines of code which has multiple data steps. Now I want to know which step is running in which node and what is happening in the Grid server as soon as we submit the code.

I tried to understand the documents but I couldn't locate the answer. Appreciate if someone of you help me answer this question in layman's term.

Babloo · Posted 02-04-2019 06:34 AM

Assume I've 15 Grid compute nodes and I'm executing a 1000 lines of code which has multiple data steps. Now I want to know which step is running in which node and what is happening in the Grid server as soon as we submit the code.

I tried to understand the documents but I couldn't locate the answer. Appreciate if someone of you help me answer this question in layman's term.

MargaretC · Posted 02-04-2019 09:47 AM

In the scenario above, if you have made no modifications to the 1000s of lines of SAS code, they will all run on a single SAS Grid node, the same way they do today.
If you want to split the logic up of the SAS job, you will need to manually do that. We have a SAS tool that will let you know if you are even able to split the logic up. It is called scaproc. More details on how to do this can be found in this document. http://support.sas.com/documentation/cdl/en/gridref/67371/HTML/default/viewer.htm#n0qbehvhrcl5van165...
Margaret

JuanS_OCS · Posted 02-04-2019 06:55 AM

Hello @Babloo ,

yours is an excellent question, about concepts.

For SAS Grid, the unit is a Job. This is important, because a Job is, as said, a unit and what will be taken to execute in a Grid node or another.

First things first. SAS Grid manages it resources and will always bring a new Job, a unit, to the node that is less loaded, and as defined by the queue (and Grid Option Sets) where it enters, unless it is manually specified.

Secondly, a sas program, a .sas file, will be, in general, a Job, a unit. All of it. Unless, that you define in the code, the blocks of the code that can be considered as a unit, a Job. Then the SAS Grid will bring, a gain, the piece of code to the grid node that is less loaded and following the rules of the queues and Grid Option Sets.

Of course, you can also define manually in the code, what piece of code do you want it to run and where, if needed. In general, this option defeats a bit the purpose of the SAS Grid Manager, but it is also true that, in a few cases, it might be a good idea.

And last but not the least, there are a few procedures, such as DS2, that can allow, the same data step, to run multi-threaded and even on multiple Grid Nodes, maximizing your resource usage and performance.

All in all, I suggest you to peek into papers and documentation. Some good examples:

An old but great paper: Grid Computing with SAS® - A Developer’s Perspective
Parallel Processing Hands-On Workshop
Enabling Distributed Parallel Execution of SAS Jobs

In addition, if you really want to monitor in detail the execution of any SAS job, grid or not grid, where they execute, the performance of each piece, etc, I would like to highly recommend you to give a chance to Boemska's Enterprise Session Monitor . If you are interested, you can get more information from @boemskats (I hereby summon you).

I hope this can help you. Please let us know if you need more information.

Kind regards,

Juan

doug_sas · Posted 02-04-2019 07:33 AM

If I read your post correctly, you have a SAS program that you want to split its processing such that it can process the data steps in the program in parallel. To do this I am assuming you have SAS Grid Manager's integration in SAS/CONNECT. This integration will allow you to start multiple SAS/CONNECT servers on the grid and allow you to send parts of the code to different servers for processing in parallel. (You can read how to do this in my SGF 2015 paper Divide and Conquer— Writing Parallel SAS® Code to Speed Up Your SAS Program).

You could output the data step you are executing as part of the code you submit to a server so that it ends up in the server's log. As the code runs on each individual server in parallel, the output is sent back to the client and saved (spooled) until a statement triggers its merging into the client's log file. The only way to see the spooled log as code is running is to use the RDISPLAY command to pop up a window for the log of the remote session. This assume you are executing the client from a SAS display manager session - if you are using SAS/Studio or Enterprise Guide this will not be available. Another option is to route the log to a file for the specific RSUBMIT and view the log files using your favorite text tool to look at the file.

Since you are writing the parallel code, you could could output statements in the client log just before the RSUBMIT to say what data step you are routing to which session. That may be the easiest thing to do.