02-28-2018 11:23 AM - last edited on 02-28-2018 11:29 AM by ChrisHemedinger
I am using RSUBMIT method to distribute the jobs parallel. I have scenario where I am not able to find the solution.
1) I submitted the RSUBMIT blocks and distributed the jobs to different grids
2) due to some reason middle of the programs execution one of the grid node is unavailable/failed
Is there any way to find the unavailable/unresponsive/ Idle grid node?
03-01-2018 03:08 AM - edited 03-01-2018 03:10 AM
yours is a very good question. I think your answer is in checking the status of the Connect Spawner and the Object Spawner in every node of your grid.
UI-wise, RTM or Grid Monitor will help you.
Now, command-line-wise, you could go for one of these options:
- check the logs of the spawners
- check the status of the services with ego commands
- Run a Gridtest_fast.sas or Gridtest.sas before executing your program https://support.sas.com/rnd/scalability/grid/download.html however I think this might be a bit overkilling if you plan to run it before every program.
Nevertheless, I think that besides the checks, you rather would configure High Availability (HA) in your grid environment, precisely to ensure that,if any grid or service goes down, the service will start in other machine for the time being. On this way, your programs should not fail.
For all of that and more, you can check:
Please note that most of those ways are for the administrators of your environment to execute. I mean, a programmer/user should not really be concerned about availability of the environment. If technical problems arise, you open a support ticket and other team would take care of it in your behalf.
03-01-2018 03:31 AM
Thanks for the reply.
We are not using the EG UI for submitting the SAS jobs. In this Scenario I can not event configure the (HA) and I can't check the logs of the spawners.
I have another question, Is there any way to find the allocated memory and available free memory a server (Grid node) grammatically or by commands?.
03-01-2018 03:39 AM
a couple of comments:
03-01-2018 04:07 AM
I am not authorized to run the commands. I asked to know that if any thing available or not? .
Mainly I am looking programmatic way to know the find the inactive grid server and available Free memory of the server.
03-01-2018 04:59 AM
I don't think you can, without running X commands or running commands directly on your server's shell interface.
To run X commands is a high level risk, hence the reason because it is normally disabled. It is normally enabled only for few high level users or administrators.
All above make my advise to stand: please align with your SAS or system administrators. Open communication with them and ensure they understand your challenge, which is, in the end and probably, the challenge of your company business.
03-01-2018 05:36 AM
Thank you JuanS_OCS. I also felt the same.
I have general question, Any way to submit the jobs only to particular grid node. We have four Grid nodes, I want to submit the rsubmit blocks to only one Grid node.
03-02-2018 11:24 PM
03-03-2018 09:36 PM - edited 03-03-2018 09:38 PM
"I have general question, Any way to submit the jobs only to particular grid node."
Setting up workload balancing is a SAS Admin task. I don't know why you would only want to use a single node but yes, that can be done by defining a queue which only hits a single node. Setting this up is a SAS Admin task so discuss your requirement with this person at your site.
For execution of 30 jobs in parallel: First of all the LSF queue you're using must allow for 30 jobs in parallel else whatever you set-up won't execute as parallel as you believe. Secondly: If possible use LSF to define your flows and define parallelism and job dependencies there.