BookmarkSubscribeRSS Feed
jbond007
Obsidian | Level 7

Hi SAS Communities,

 

Our LSF suddenly cannot schedule jobs and we already did a restart of the SAS and LSF services but still got the same issue.

When running the schedule now in SMC, it says flow has been successfully scheduled to run on Platform Process Manager but when logging in to flow manager it says exited.

 

On the JFD log, it shows like this

JFJobExecutionAgent::checkReturnStatus: Failed to execute command <"/sas/scheduler/lsf/10.1/linux2.6-glibc2.3-x86_64/bin//bsub" -J '5984:sasdemo:Test_flow:Test_Job' -o '/dev/null' -fid 5984 '/sas/config/Lev1/SASApp/BatchServer/sasbatch.sh -log /sas/config/Lev1/SASApp/BatchServer/Logs/Test_flow_Test_Job_#Y.#m.#d_#H.#M.#s.log -batch -noterminal -logparm "rollover=session" -sysin /sas/config/Lev1/SASApp/SASEnvironment/SASCode/Jobs/Test_Job.sas'>. Exited with <32256>. 

 

It seems it cannot schedule on the batch server but it doesnt generate logs either.. 

8 REPLIES 8
gwootton
SAS Super FREQ

This failure is saying Process Manager is trying and failing to run the command to submit a job using the LSF bsub command, getting back an exit code of 32256.

 

What happens when you try to run a bsub command outside of Process Manager as the sasdemo user? i.e. "bsub sleep 10" to submit a job that sleeps for 10 seconds.

--
Greg Wootton | Principal Systems Technical Support Engineer
jbond007
Obsidian | Level 7
Hi Greg,

Thank you for replying. I tried that also bsub sleep 20 and then i run bjobs -l and i can see the sleep job also.
gwootton
SAS Super FREQ
Is this failure consistent or intermittent? You may want to check your system logs to see why the exit code is being returned. If it's inconsistent, Process Manager does have options you can set in js.conf for JS_BSUB_RETRY_EXIT_VALUES and JS_START_RETRY to have it retry the submission if it encounters specific exit codes, and you could add 32256 to this list. https://www.ibm.com/support/pages/job-submission-fails-and-flow-fails
--
Greg Wootton | Principal Systems Technical Support Engineer
jbond007
Obsidian | Level 7
Hi greg,

Its consistent. Suddenly users cannot schedule, even new flows cannot schedule. It seems the connection from SMC to LSF is the issue but couldnt trace where. I can run the same jobs on SAS even on linux command line, I can schedule also on LSF in linux also.
I raised to SAS TS also and they still checking
gwootton
SAS Super FREQ
The components at work here are SAS Management Console, Process Manager (jfd) and LSF. SMC is a Process Manager client, telling Process Manager to run the flow. Process Manager does this by submitting the job(s) in the flow to LSF.

From the error the failure is occurring outside of SAS Management Console, when Process Manager is submitting the job to LSF. Process Manager (jfd) is running the bsub command which is returning an exit code of 32256.

When you say you can schedule on LSF in Linux, are you saying it works when running SMC from the Linux host, or when instructing the flow to run in Flow Manager, or are you submitting the same bsub command to LSF from the command line?
--
Greg Wootton | Principal Systems Technical Support Engineer
jbond007
Obsidian | Level 7

"The components at work here are SAS Management Console, Process Manager (jfd) and LSF. SMC is a Process Manager client, telling Process Manager to run the flow. Process Manager does this by submitting the job(s) in the flow to LSF." -- yes this one is correct.

 

"When you say you can schedule on LSF in Linux, are you saying it works when running SMC from the Linux host, or when instructing the flow to run in Flow Manager, or are you submitting the same bsub command to LSF from the command line?" -- when submitting the flow on SMC it says successful but when we're seeing it on flow manager it says its exited.
The one I can schedule is bsub from the command line itself.


I have raise this to Technical Support, they enable debug and got this

JFJobExecutionAgent::_executeCommand: Unable to get JS_SU_COMMAND

JFJobExecutionAgent::_executeCommand: su command to execute </bin/su sasdemo -c /sas/scheduler/pm/work/tmp/JS_HjFO64

 

As per the IBM ticket, https://www.ibm.com/support/pages/job-submission-failed-different-error-codes. It says its because of the directory not having the executable permission but the noexec is not visible on the mount point of this directory and we tried to change also to another but same result..

We are still investigating on this 

 

 

Sajid01
Meteorite | Level 14

Hello @jbond007 
I suggest checking up the permissions/acl on the directory. Possible discuss with the OS Admin,

jbond007
Obsidian | Level 7

We were able to resolve now this issue.

The reason behind the below error message is that /bin/su has only Read permission..

Whereas compared to our working environment, there is an read,write, permission. After update the permission we can successfully schedule the job smoothly.

Unable to get JS_SU_COMMAND

JFJobExecutionAgent::_executeCommand: su command to execute </bin/su sasdemo -c /sas/scheduler/pm/work/tmp/JS_HjFO64

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 741 views
  • 6 likes
  • 3 in conversation