BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Helannivas
Quartz | Level 8

Hi,

 

Some times when I launch the flow manager , I'm getting the below error as 

 

"Error communicating with the daemon : Invalid response - Error communicating with daemon".

 

When I tried the check the log, I could not able to figure out any errors in below log files

1) elim.log

2) lim.log

3) mbatchd.log

4) mbschd.log

5) res.log

6) sbatchd.log

 

The above log files are found under LSF log folder. When I usually gets this error, I will restart the whole application and then the issue gets resolved.

 

Could not able to figure out the exact reason for this. This is affecting our overnight batches and that causes a delay. When i restart the application during business hours, its affecting the Business (the metadata server goes down and resumes back.)

 

We are using Platform Process Manager 8.1 and SAS 9.1.3 version. Could you please help me to find out the fix for this issue.

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
JuanS_OCS
Amethyst | Level 16

Hello @Helannivas, @nhvdwalt,

 

SAS has notice of this problem on the IBM product since 2015. http://support.sas.com/kb/57/006.html

 

Please check this note. Perhaps your Server minor version is newer than your client version. This might be the case is there was an update on the server side, but never happened on the client side. Anyway, 2 things worth from it:

 

1. try this fix.

2. Install patches IBM on LSF on server and client side. Always with the help of SAS Technical Support. There are many issues on LSF solved by installing a few patches.

View solution in original post

14 REPLIES 14
nhvdwalt
Barite | Level 11

Hi @Helannivas

 

Yes, I've had this before (....still have it).

 

But first, let's see if it's the same problem. I'm assuming UNIX here.

 

Do a....

 

ps -ef | grep -v grep | grep jfd

 

...and see if there are more than one jfd process. There should normally just be one. However, it is normal for to have a second jfd process (spawned by the first jfd) for a brief period of time. If the second jfd persists, it might be the same issue as us. Like you say, when you kill and restart, all is ok.

 

If the second jfd persists, check your logging level in lsf.conf (..I think). For some reason, our log level was set to DEBUG causing excessive logging during busy periods. I have reduced the logging level to normal and so far so good. IBM has provided us with a patch and a new jfd executable that will log additional traces for them to investigate, but since we're in freeze, I haven't been able to apply it.

 

If you have the same issue, I have a Track # that I can share which will save you and Tech Support many hours of troubleshooting.

Helannivas
Quartz | Level 8

Hi,

 

I tried running the ps -ef | grep -v grep | grep jfd in my UNIX server where sas has been installed.

 

I could able to see only one jfd process under root user. 

 

 

nhvdwalt
Barite | Level 11

Just one point to note. Flow Manager will be an issue with Process Manager (server side), not LSF. You mentioned you found the logs under ../lsf/log. Look under ../pm/log for the jfd* log

Helannivas
Quartz | Level 8

Hi,

 

Thanks for pointing out to jfd log.

 

I have got the issue on Oct 3,2017 at around 04:30 and have the log as

 

JFFLowmanger:: ProcessFlowEvent : Unable to send mail for flow 

 

That's the last log entry in the jfd log file at 04:30 and then I have restarted at 09:22 and we have entry as 

 

JFuser :: initLsfUserDomain LSF_user_Domain = 

JS_ARM_Enable = False

JFLockFIile ::: getlock : Got the lock file

This process manager is processed with SAS License

JFArchive :: _readByte : reading beyond end of file.

 

I don't see any  error entries in the log file at around 04:30.

nhvdwalt
Barite | Level 11

My apologies, I should have mentioned. You run the command WHEN you are having the issue and BEFORE you restart Process Manager.

 

Also, you mentioned SAS 9.1.3. Why such an old version ? 

nhvdwalt
Barite | Level 11

I also implemented a monitoring script that checks for multiple jfd processes every minute. This way you can get onto it before getting complaints. Just a word of caution. It is normal for the first jfd to launch another jfd briefly...as per IBM. But it's quick. If the second one persists, then you have the issue i.e. you might some false positives sometimes.

Helannivas
Quartz | Level 8

Hi,

 

Let me try out the script when the issue persists again..Hopefully, it should not 🙂

 

We are migrating to 9.4 and will be completed soon.

 

So, that means there is no fix for this issue. Every time , we have to restart  the application, if the issue occurs.

nhvdwalt
Barite | Level 11

We're still working on the fix. The Tech Support Track # is 7612222597. Maybe log a Track and make reference to mine and say the symptoms are similar.

 

What is the value of LSF_LOG_MASK in ../lsf/conf/lsf.conf ? Ours was set to to LOG_DEBUG which I believed caused the problem under heavy load. Generally logging should only be increased to DEBUG for a short period while troubleshooting. Definitely not under normal operations. Still to be confirmed, but I believe this was a contributor.

Helannivas
Quartz | Level 8

Thanks for the Support Track.

 

LSF_LOG_MASK = LOG_WARNING under lsf.conf file.

nhvdwalt
Barite | Level 11

Ok, then I suggest you log a call and maybe they can give you the jfd module that allows for additional tracing. IBM will analyse this trace to see what's going on.

Helannivas
Quartz | Level 8

Hi,, Any sort of solution for the issue from your side????

nhvdwalt
Barite | Level 11

Unfortunately I had to handover the issue to another resource and I'm not sure if they were able to resolve it.

 

My only suggestion is to troubleshoot with Tech Support....

Helannivas
Quartz | Level 8

Ok Sure..Thanks for your Suggestion

JuanS_OCS
Amethyst | Level 16

Hello @Helannivas, @nhvdwalt,

 

SAS has notice of this problem on the IBM product since 2015. http://support.sas.com/kb/57/006.html

 

Please check this note. Perhaps your Server minor version is newer than your client version. This might be the case is there was an update on the server side, but never happened on the client side. Anyway, 2 things worth from it:

 

1. try this fix.

2. Install patches IBM on LSF on server and client side. Always with the help of SAS Technical Support. There are many issues on LSF solved by installing a few patches.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 9918 views
  • 2 likes
  • 3 in conversation