Architecting, installing and maintaining your SAS environment

Grid/LSF will not dispatch jobs

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Grid/LSF will not dispatch jobs

Something is up with our SAS Grid (9.4M3).  It is a new installation and even a simple "bsub sleep 20" request remains in PEND status.

 

bjobs -l 768

 

PENDING REASONS:
New job is waiting for scheduling;

 

SCHEDULING PARAMETERS:
r15s r1m r15m ut pg io ls it tmp swp mem
loadSched - - - - - - - - - - -
loadStop - - - - - - - - - - -

RESOURCE REQUIREMENT DETAILS:
Combined: -
Effective: -

 

Is there an LSF command line command that will give detailed information on why the job is not being dispatche, like the criteria LSF is using to keep the job in the pending status?


Accepted Solutions
Solution
‎06-12-2017 11:45 AM
Super User
Posts: 6,938

Re: Grid/LSF will not dispatch jobs


bdoug wrote:

That is interesting, but not our issue.  It turns out we had a NTP issue on our servers, which confused LSF.

 

Still wonder is there is a command to get LSF to tell you why it is not dispatching a job. 


How should a dispatcher do this when the server's time is not correctly set? If it knew it should run a job, it would run the job. If the time is not right for the job, then the job shall not be run anyway, and no message needs to be sent.

If you had the dispatcher send you a message for every job it does not run at the moment for any reason, you'd be drowned in messages in seconds.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 3,108

Re: Grid/LSF will not dispatch jobs

This may be a long shot but this problem sounds very similar to one we struck. There is a weird bug in LSF at least up to V9.1 which happens when you configure it to send emails and an email error occurs. LSF creates a file called PROGRAM and stores it in the same root folder where LSF is installed (eg on Windows C:\Program Files) and writes the email SMTP error in it.

 

This rogue file called PROGRAM then blocks any use of LSF thereafter as it is executed instead of the real LSF program! Simply renaming the file will fix the problem. 

Occasional Contributor
Posts: 14

Re: Grid/LSF will not dispatch jobs

That is interesting, but not our issue.  It turns out we had a NTP issue on our servers, which confused LSF.

 

Still wonder is there is a command to get LSF to tell you why it is not dispatching a job. 

Solution
‎06-12-2017 11:45 AM
Super User
Posts: 6,938

Re: Grid/LSF will not dispatch jobs


bdoug wrote:

That is interesting, but not our issue.  It turns out we had a NTP issue on our servers, which confused LSF.

 

Still wonder is there is a command to get LSF to tell you why it is not dispatching a job. 


How should a dispatcher do this when the server's time is not correctly set? If it knew it should run a job, it would run the job. If the time is not right for the job, then the job shall not be run anyway, and no message needs to be sent.

If you had the dispatcher send you a message for every job it does not run at the moment for any reason, you'd be drowned in messages in seconds.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 14

Re: Grid/LSF will not dispatch jobs

Maybe instead of sending a message, supply the reason for pending in extended bjobs information?  bjobs -l <ID>

 

Therefore, I can see why a job is pending but only when I ask.

Trusted Advisor
Posts: 1,141

Re: Grid/LSF will not dispatch jobs

Hello @bdoug,

 

I wonder if this information would help you out: http://www-01.ibm.com/support/docview.wss?uid=isg3T1016430

 

Besides this some items from my personal experience:

 

- I would check at IBM site and asking SAS Technical Support if there is any recommended patch available for your LSF version... normally there is something.

 

- Also on a Linux installation, I found this several times because some part of the installation and configuration was not done properly (normally, on the LSF config or pre-requisites).

 

- Please check also your queues config... it might be that the job is not going to the queue you expect, or just hanging there forever.

 

So I would propose: while you investigate this on your side or with us, I would contact SAS Technical Support, normally they have nice insights.

 

Best regard,

Juan

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 235 views
  • 0 likes
  • 4 in conversation