BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sathya66
Barite | Level 11

Hi All,

we have seen below ERRORs in JFD logs. how can we fix this.

 

sathya66_0-1633245739603.png

Thanks

SS

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
gwootton
SAS Super FREQ
This message is coming from an authorization failure, meaning a user is trying to log in to process manager and supplying incorrect credentials. If the user did not exist in active directory (e.g. disabling the user) authentication would still fail and the message would still appear. That the authentication attempt is happening many times in a single minute and then again 5 minutes later makes me think this is not the user manually logging in to Flow Manager but an automated process, presumably one does not have the correct password for the user. It is that automated process that needs to be stopped for the messages to stop appearing.
--
Greg Wootton | Principal Systems Technical Support Engineer

View solution in original post

12 REPLIES 12
SASKiwi
Opal | Level 21

Have you tried stopping and restarting the Platform Process Manager services? If not it would be worth trying.

sathya66
Barite | Level 11

This is happening only after restart .
We have stopped  everything

<>/profile.lsf;
<> /profile.js;

jadmin stop
badmin hshutdown
lsadmin resshutdown
lsadmin limshutdown

and restarted again

<>/profile.lsf;
<> /profile.js;

lsadmin limstartup;
lsadmin resstartup;
badmin hstartup;
jadmin start;

and also we are getting this error in lim.log


Oct 4 08:54:43 2021 21763 4 3.4.0 main: Received request <5> from non-EGO host 11.201.77.21:27553
Oct 4 08:54:43 2021 21763 4 3.4.0 main: IP of Host compute.eng.prod has changed, this IP now belongs to ip-11.201.77.21.eu-east-3.compute (11.201.77.21:33249)
Oct 4 08:54:43 2021 21763 4 3.4.0 main: Received request <5> from non-EGO host 11.201.77.21:33249
Oct 4 08:54:43 2021 21763 4 3.4.0 main: IP of Host compute.eng.prod has changed, this IP now belongs to ip-11.201.77.21.eu-east-3.compute (11.201.77.21:33249)
Oct 4 08:54:43 2021 21763 4 3.4.0 main: Received request <5> from non-EGO host 11.201.77.21:33249
Oct 4 08:54:44 2021 21763 4 3.4.0 main: IP of Host compute.eng.prod has changed, this IP now belongs to ip-11.201.77.21.eu-east-3.compute (11.201.77.21:15311)
Oct 4 08:54:44 2021 21763 4 3.4.0 main: Received request <5> from non-EGO host 11.201.77.21:15311
Oct 4 08:54:44 2021 21763 4 3.4.0 main: IP of Host compute.eng.prod has changed, this IP now belongs to ip-11.201.77.21.eu-east-3.compute (11.201.77.21:15311)

 

and also in JFD.log

 

2021 Oct 04 09:07:00 21967 22060 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:07:35 21967 22061 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:07:35 21967 22065 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:07:35 21967 22063 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:07:35 21967 22058 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:07:35 21967 22062 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:07:35 21967 22060 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:07:35 21967 22059 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:09:03 21967 22009 3 JFJobExecutionAgent::checkReturnStatus: Failed to execute command <"/opt/home/lsf/10.1/linux2.6-glibc2.3-x86_64/bin//bsub" -J '260195:lsfuser:CLI_OUTCOME_MON:CLI_OUTCOME_MON' -o '/dev/null' -fid 260195 '/opt/config/Lev1/SASApp/BatchServer/sasbatch.sh -log ~/logs/CLI_OUTCOME_MON_CLI_OUTCOME_MON_#Y.#m.#d_#H.#M.#s.log -batch -noterminal -logparm "rollover=session" -sysin /sas/deployed_jobs/CLI_OUTCOME_MON.sas'>. Exited with <9>. .
2021 Oct 04 09:09:03 21967 22009 3 JFLSFExecutionAgent::_submitToLSF: The job submission script has been running for too long, and is killed by JFD; error code '118'.
2021 Oct 04 09:10:22 21967 22056 3 JFJobExecutionAgent::checkReturnStatus: Failed to execute command <bkill 517227>. Exited with <9>. .
2021 Oct 04 09:12:36 21967 22062 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:12:36 21967 22060 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:12:36 21967 22064 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:12:36 21967 22056 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:12:36 21967 22059 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:12:36 21967 22057 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:12:36 21967 22061 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 04 09:14:03 21967 22009 3 JFJobExecutionAgent::checkReturnStatus: Failed to execute command <"/opt/home/lsf/10.1/linux2.6-glibc2.3-x86_64/bin//bsub" -J '260196:lsfuser:CLI_FREEZE:CLI_FREEZE' -o '/dev/null' -fid 260196 '/opt/config/Lev1/SASApp/BatchServer/sasbatch.sh -log ~/logs/CLI_FREEZE_CLI_FREEZE_#Y.#m.#d_#H.#M.#s.log -batch -noterminal -logparm "rollover=session" -sysin /sas/deployed_jobs/CLI_FREEZE.sas'>. Exited with <9>. .
2021 Oct 04 09:14:03 21967 22009 3 JFLSFExecutionAgent::_submitToLSF: The job submission script has been running for too long, and is killed by JFD; error code '118'.




something has changed in the hosts references but all looks fine except flow manager.

Note: this is happening only after linux box restarting .

gwootton
SAS Super FREQ
Process manager's eauth process is what performs authentication. The verifyEauth messages seems to be saying that a logon attempt is failing. Given the cadence I suspect an automated process is trying unsuccessfully to establish a connection to process manager, maybe a port scanner?
These messages in the LIM log indicate a host that is not defined in its configuration is trying to connect to it:
Oct 4 08:54:43 2021 21763 4 3.4.0 main: Received request <5> from non-EGO host 11.201.77.21:27553

Based on the message that follows it, it seems that "compute.eng.prod" no longer resolves to IP 11.201.77.21? A DNS issue?
Oct 4 08:54:43 2021 21763 4 3.4.0 main: IP of Host compute.eng.prod has changed, this IP now belongs to ip-11.201.77.21.eu-east-3.compute (11.201.77.21:33249)

The other messages in your JFD log show JFD is failing to execute job submission and termination commands (bsub, bkill). You may wish to try running bsub on the process manager host directly to see the error being produced, I'm thinking it doesn't think the host is a configured LSF client and is blocking the requests (i.e. the non-EGO host message in LIM).
--
Greg Wootton | Principal Systems Technical Support Engineer
sathya66
Barite | Level 11
Thanks,
updated the hosts in hosts file and resolved the issue but its strange DNS is working and IP is resolved.
but we are still getting the below error
2021 Oct 05 08:10:31 32000 32101 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 05 08:10:31 32000 32099 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 05 08:15:31 32000 32097 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.


how can find a port scanner?



gwootton
SAS Super FREQ

You could identify where the request is coming from by setting the debug log for jfd to level 10. You can do this by running:

jreconfigdebug -l 10

Wait until the next failure (up to 5 minutes it looks like) and then set the level back to 5:

jreconfigdebug -l 5

Then find the failure in the jfd log to get the thread ID (in this case 18093):

$ tail -500 jfd.log.* | grep eauth.*failed
...
2021 Oct 05 09:13:32 17925 18093 3 JFEauthManager::verifyEauth: eauth len=9 failed; rc=0.

And finally grep for that thread ID and "uData" to identify the source IP:

$ tail -500 jfd.log.* | grep 18093.*uData
2021 Oct 05 09:13:30 17925 18093 10 JFEauthManager::verifyEauth: uData = 2147483647 2147483647 lsfadmin 10.1.2.3 62693 9 NULL NULL NULL NULL

  

--
Greg Wootton | Principal Systems Technical Support Engineer
sathya66
Barite | Level 11

Thanks for this

I have done the debug

I am getting as below

if I do

tail -500 jfd.log.* | grep eauth.*failed

2021 Oct 06 08:58:43 32000 32094 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 06 08:58:43 32000 32094 7 JFGenericDaemon::startup:Authentication failed for user [llm720]
2021 Oct 06 08:58:44 32000 32097 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 06 08:58:44 32000 32097 7 JFGenericDaemon::startup:Authentication failed for user [llm720]
2021 Oct 06 08:58:44 32000 32101 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 06 08:58:44 32000 32101 7 JFGenericDaemon::startup:Authentication failed for user [llm720]
2021 Oct 06 08:58:44 32000 32095 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 06 08:58:44 32000 32095 7 JFGenericDaemon::startup:Authentication failed for user [llm720]
2021 Oct 06 08:58:44 32000 32096 3 JFEauthManager::verifyEauth: eauth len=22 failed; rc=0.
2021 Oct 06 08:58:44 32000 32096 7 JFGenericDaemon::startup:Authentication failed for user [llm720]

 

If I do

 tail -500 jfd.log.* | grep 18093.*uData

2021 Oct 06 08:58:44 32000 32097 10 JFEauthManager::verifyEauth: uData = 2147483647 2147483647 llm720 11.90.188.112 24265 22 NULL NULL NULL NULL

so what is this mean

gwootton
SAS Super FREQ
Looks like the login attempt is coming from host with IP address 11.90.188.112 and the user ID being passed is "llm720".
--
Greg Wootton | Principal Systems Technical Support Engineer
sathya66
Barite | Level 11

Thanks.
Is there a way we can fix these ERRORs.every user is getting the same ERROR. looks like that IP is their local/PC IP(they are trying to connect from ) and they are connecting to flow manager fine.

users are in the batch user group(file lsb.users) so they control their flows.

gwootton
SAS Super FREQ
The message is an authentication failure, meaning the password being passed for user llm720 is being determined as incorrect.

If llm720 is not a local account, it may be you need to configure eauth to use PAM to authenticate users.

https://www.ibm.com/support/pages/account-authentication-process-manager
--
Greg Wootton | Principal Systems Technical Support Engineer
sathya66
Barite | Level 11

We configured PAM correctly and user is able to login to SAS or to flow manager. If we disable the user in AD(we didn't tidy up few users in SAS metadata or in lsb.users but disabled in AD) , do we need to restart any PM services to get affect(ie: so we don’t see this error in jfd log)?

 

gwootton
SAS Super FREQ
This message is coming from an authorization failure, meaning a user is trying to log in to process manager and supplying incorrect credentials. If the user did not exist in active directory (e.g. disabling the user) authentication would still fail and the message would still appear. That the authentication attempt is happening many times in a single minute and then again 5 minutes later makes me think this is not the user manually logging in to Flow Manager but an automated process, presumably one does not have the correct password for the user. It is that automated process that needs to be stopped for the messages to stop appearing.
--
Greg Wootton | Principal Systems Technical Support Engineer
sathya66
Barite | Level 11

Found the root cause.
User forgot to close/logout from flow manager(user is not a SAS user anymore but user is still in the business)
Thanks for your help

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 1584 views
  • 8 likes
  • 3 in conversation