Hi,
Some times when I launch the flow manager , I'm getting the below error as
"Error communicating with the daemon : Invalid response - Error communicating with daemon".
When I tried the check the log, I could not able to figure out any errors in below log files
1) elim.log
2) lim.log
3) mbatchd.log
4) mbschd.log
5) res.log
6) sbatchd.log
The above log files are found under LSF log folder. When I usually gets this error, I will restart the whole application and then the issue gets resolved.
Could not able to figure out the exact reason for this. This is affecting our overnight batches and that causes a delay. When i restart the application during business hours, its affecting the Business (the metadata server goes down and resumes back.)
We are using Platform Process Manager 8.1 and SAS 9.1.3 version. Could you please help me to find out the fix for this issue.
Thanks in advance.
Hello @Helannivas, @nhvdwalt,
SAS has notice of this problem on the IBM product since 2015. http://support.sas.com/kb/57/006.html
Please check this note. Perhaps your Server minor version is newer than your client version. This might be the case is there was an update on the server side, but never happened on the client side. Anyway, 2 things worth from it:
1. try this fix.
2. Install patches IBM on LSF on server and client side. Always with the help of SAS Technical Support. There are many issues on LSF solved by installing a few patches.
Hi @Helannivas
Yes, I've had this before (....still have it).
But first, let's see if it's the same problem. I'm assuming UNIX here.
Do a....
ps -ef | grep -v grep | grep jfd
...and see if there are more than one jfd process. There should normally just be one. However, it is normal for to have a second jfd process (spawned by the first jfd) for a brief period of time. If the second jfd persists, it might be the same issue as us. Like you say, when you kill and restart, all is ok.
If the second jfd persists, check your logging level in lsf.conf (..I think). For some reason, our log level was set to DEBUG causing excessive logging during busy periods. I have reduced the logging level to normal and so far so good. IBM has provided us with a patch and a new jfd executable that will log additional traces for them to investigate, but since we're in freeze, I haven't been able to apply it.
If you have the same issue, I have a Track # that I can share which will save you and Tech Support many hours of troubleshooting.
Hi,
I tried running the ps -ef | grep -v grep | grep jfd in my UNIX server where sas has been installed.
I could able to see only one jfd process under root user.
Just one point to note. Flow Manager will be an issue with Process Manager (server side), not LSF. You mentioned you found the logs under ../lsf/log. Look under ../pm/log for the jfd* log
Hi,
Thanks for pointing out to jfd log.
I have got the issue on Oct 3,2017 at around 04:30 and have the log as
JFFLowmanger:: ProcessFlowEvent : Unable to send mail for flow
That's the last log entry in the jfd log file at 04:30 and then I have restarted at 09:22 and we have entry as
JFuser :: initLsfUserDomain LSF_user_Domain =
JS_ARM_Enable = False
JFLockFIile ::: getlock : Got the lock file
This process manager is processed with SAS License
JFArchive :: _readByte : reading beyond end of file.
I don't see any error entries in the log file at around 04:30.
My apologies, I should have mentioned. You run the command WHEN you are having the issue and BEFORE you restart Process Manager.
Also, you mentioned SAS 9.1.3. Why such an old version ?
I also implemented a monitoring script that checks for multiple jfd processes every minute. This way you can get onto it before getting complaints. Just a word of caution. It is normal for the first jfd to launch another jfd briefly...as per IBM. But it's quick. If the second one persists, then you have the issue i.e. you might some false positives sometimes.
Hi,
Let me try out the script when the issue persists again..Hopefully, it should not 🙂
We are migrating to 9.4 and will be completed soon.
So, that means there is no fix for this issue. Every time , we have to restart the application, if the issue occurs.
We're still working on the fix. The Tech Support Track # is 7612222597. Maybe log a Track and make reference to mine and say the symptoms are similar.
What is the value of LSF_LOG_MASK in ../lsf/conf/lsf.conf ? Ours was set to to LOG_DEBUG which I believed caused the problem under heavy load. Generally logging should only be increased to DEBUG for a short period while troubleshooting. Definitely not under normal operations. Still to be confirmed, but I believe this was a contributor.
Thanks for the Support Track.
LSF_LOG_MASK = LOG_WARNING under lsf.conf file.
Ok, then I suggest you log a call and maybe they can give you the jfd module that allows for additional tracing. IBM will analyse this trace to see what's going on.
Hi,, Any sort of solution for the issue from your side????
Unfortunately I had to handover the issue to another resource and I'm not sure if they were able to resolve it.
My only suggestion is to troubleshoot with Tech Support....
Ok Sure..Thanks for your Suggestion
Hello @Helannivas, @nhvdwalt,
SAS has notice of this problem on the IBM product since 2015. http://support.sas.com/kb/57/006.html
Please check this note. Perhaps your Server minor version is newer than your client version. This might be the case is there was an update on the server side, but never happened on the client side. Anyway, 2 things worth from it:
1. try this fix.
2. Install patches IBM on LSF on server and client side. Always with the help of SAS Technical Support. There are many issues on LSF solved by installing a few patches.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.