BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Timmy2383
Lapis Lazuli | Level 10

We have numerous flows (150-200) that are triggered by all kinds of triggering events.  Recently I saw some weird behavior where certain flows which are scheduled to run using the standard Daily@Sys calendar (varying times of the day) just simply did not start. There is no record in Flow Manager showing that they even tried (i.e. no history).  They've been running fine for a long time, but simply did not run on a given day. Then they ran fine the next day?

 

Does anyone know how to troubleshoot/investigate?  I'm guessing there are relevant logs somewhere within the PM product install directory, but I'm not as familiar with this product as I am with SAS products.

 

My first hunch is that maybe there's a load/server limit somewhere?  I would kind of doubt it, though, since the latest instance had to do with a job at 12:05 am, and we do not have many jobs scheduled for that time.

 

In case it's relevant, we are on SAS 9.4M4, Platform Suite 9.1.3, Linux x64 servers in a grid environment.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Timmy2383
Lapis Lazuli | Level 10

SAS Tech Support recommended I take various steps to clear out the JFD/Process Manager history and cache. I followed their recommendation and so far no issues.  Here's what they told me:

 

1. Backup/delete lsb.events.* only keep lsb.events and lsb.events.1
2. Backup and Delete the following two files
                (a) $JS_HOME/work/system/jobidmap.dat.1
                (b) $JS_HOME/work/system/ lsf.events
3. Backup/Delete all files in $JS_HOME/work/history except created at the last 3-5 days
4. Backup/Delete all files in $JS_HOME/work/events except created at the last 3-5 days, only keep js.events
5. Backup/Delete all files in $JS_HOME/work/variable except created at the last 3-5 days
6. Backup/Delete all files in $JS_HOME/work/storage/error except created at the last 3-5 days
7. Backup/Delete all files in $JS_HOME/work/storage/flow_instance_storage/finished , leave 5 days
8. Backup/Delete $LSF_HOME/work/<cluster>/logdir/lsb.event.* except created at the last 3-5 days ($LSF_HOME D=\LSF_51\work\cluster1\logdir)
9. Backup/Delete all files in $JS_HOME/work/storage/cache/
10. Backup/Delete $JS_HOME/log file located there!!

View solution in original post

4 REPLIES 4
JuanS_OCS
Amethyst | Level 16

Hello @Timmy2383,

 

if you are on grid, there might be some limits on job execution, they can be placed on the top of a queue, but they will be triggered, and they will execute at a certain point. There is a difference between the Process Manager (JS) and the LSF resource manager. The process manager won;t act much differently than any other scheduler, such as cron or at. The only difference is that you can customise calendars.

 

First, I would check the jfd file (you can use locate, however it will be on your JS_Top/work directory.

 

Second, I wonder if you could check what other things happened on the server at the point when the job did not run. Example: the jfd service was stopped, hence the not triggering of the job. Or maintenance on the server (security patches?).

 

 

Timmy2383
Lapis Lazuli | Level 10

Thanks, Juan.

 

There definitely wasn't any maintenance going on. It's possible the JFD stopped for some reason but was restarted by EGO, not sure how I could determine that, though.

 

So far SAS TS has recommended clearing out many of the PM logs and cache. I will have do this during the next scheduled maintenance window and then monitor after that.

 

 

Timmy2383
Lapis Lazuli | Level 10

SAS Tech Support recommended I take various steps to clear out the JFD/Process Manager history and cache. I followed their recommendation and so far no issues.  Here's what they told me:

 

1. Backup/delete lsb.events.* only keep lsb.events and lsb.events.1
2. Backup and Delete the following two files
                (a) $JS_HOME/work/system/jobidmap.dat.1
                (b) $JS_HOME/work/system/ lsf.events
3. Backup/Delete all files in $JS_HOME/work/history except created at the last 3-5 days
4. Backup/Delete all files in $JS_HOME/work/events except created at the last 3-5 days, only keep js.events
5. Backup/Delete all files in $JS_HOME/work/variable except created at the last 3-5 days
6. Backup/Delete all files in $JS_HOME/work/storage/error except created at the last 3-5 days
7. Backup/Delete all files in $JS_HOME/work/storage/flow_instance_storage/finished , leave 5 days
8. Backup/Delete $LSF_HOME/work/<cluster>/logdir/lsb.event.* except created at the last 3-5 days ($LSF_HOME D=\LSF_51\work\cluster1\logdir)
9. Backup/Delete all files in $JS_HOME/work/storage/cache/
10. Backup/Delete $JS_HOME/log file located there!!

JuanS_OCS
Amethyst | Level 16

Same old nice trick 🙂 . Thanks for sharing @Timmy2383!

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

CLI in SAS Viya

Learn how to install the SAS Viya CLI and a few commands you may find useful in this video by SAS’ Darrell Barton.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 4523 views
  • 1 like
  • 2 in conversation