BookmarkSubscribeRSS Feed
LukeBurvill
Fluorite | Level 6

There is a scheduled flow in our SAS environment that runs every 5 minutes (1,6,11 etc...) between 7am and 7pm. The main program in this flow makes use of a file that has the possibility of being opened by users. Because of this, the program checks it is able to get an exclusive file lock before proceeding, and if it can't, it will loop and check for up to 20 minutes. If it is unable to get the lock, the program exits gracefully, so the flow always appears "Done" in Flow Manager. The problem I'm having is, if the program ever gets in to a situation where a file lock causes other executions of the job to be delayed, Flow Manager seems to give up executing the flow any further past the currently executing instance and any instances that were held up by the originally delayed instance. The most recent occurrence had a user accidently leave the file open for almost 45 minutes. Conveniently the file was opened just prior to the 1 minute past instance starting.

 

The 1 minute past instance executed and then held in its file lock check loop for the full 20 minutes, eventually ending gracefully and holding up the 6, 11, 16 and 21 minute past instances. This resulted in the 6 minute past job executing around 21 minutes past the hour and also looping for the full 20 minutes until about 42 minutes past, holding all the instances in between, and ending gracefully. The 11 minute past instance then executed around 42 minutes past. The file was eventually freed up around 45 minutes past, letting the 11 minute past instance complete its running, followed by the 16, 21, 26, 31, 36, 41, 46 and 51 minute past instances running back to back (46 and 51 executed as they were delayed by the execution of the other instances already queued up). Once they had all completed and run successfully, Flow Manager did not execute any more instances of this flow until it was rescheduled through Schedule Manager in SAS Management Console the following morning.

 

I've been unable to find anything that would explain why LSF is giving up on executing this flow, even though the time period in question has worked on previous days without issue. I've gone through targeted searching of LSF log files to opening every log file I could find for any scrap of information that could point me in the right direction. Any time I find a log file that has any references to the execution of this flow, there are no errors, only references to the delays of waiting for previous instances to finish before the next can run ("Cannot run this flow until the following work item finishes"). When I reach the end point of the situation above, in the log files, there aren't any errors as the flow always ends gracefully, and then the flow is simply not executed again.

 

It is easier enough to reschedule the flow, but I'm stumped as to what is causing it to give up executing any future instances after a delay occurs.

 

The properties of the flow in question are:

 

Flow completion criteria: All items complete successfully or any item fails

Actions after the state of the flow is determined: Complete any work in progress and stop running the flow

Source for the flow exit code: The sum of the exit codes for all work items

Allow only one instance of the flow to run at a time

 

Run only when all of the conditions occur (there is only the below condition)

Calendar: Daily@sys

Hours: 7 - 18

Minutes: 1,6,11,16,21,26,31,36,41,46,51,56

Duration: 1

6 REPLIES 6
Kurt_Bremser
Super User

I would tackle this from the other side: an open file preventing the job from continuing.

What platform does SAS run on? And what are you doing with the file/dataset where you need to put a lock on?

LukeBurvill
Fluorite | Level 6

The Environment is running on Windows 2012 R2, SAS 9.4M4.

 

The file, a dataset, is opened and modified by users through an Excel spreadsheet using the Office Add-in. Because a user could open and edit the file at any time, the decision was made to check for a file lock before the program would continue, and apply one if the file was free. This removed the possibility of file changes occurring in the middle of the programs execution. The program processes this dataset and generates reporting layer datasets for use in Visual Analytics.  

 

Kurt_Bremser
Super User

@LukeBurvill wrote:

The Environment is running on Windows 2012 R2, SAS 9.4M4.

 

The file, a dataset, is opened and modified by users through an Excel spreadsheet using the Office Add-in. Because a user could open and edit the file at any time, the decision was made to check for a file lock before the program would continue, and apply one if the file was free. This removed the possibility of file changes occurring in the middle of the programs execution. The program processes this dataset and generates reporting layer datasets for use in Visual Analytics.  

 


Instead of waiting for a lock, create a copy of the dataset and work from that during the program execution. Make sure that you save the metadata of the originating dataset (last modifcation time) in your log, if there's a question why a certain change did not take effect.

LinusH
Tourmaline | Level 20
Ok no silver bullet for this one I guess.
The first method would be to allow for row level locking (SAS/SHARE or SAS SPD Server). But then of course a record can be under edit during batch.
Another option would be to treat updates as transactions, but then you need to write an application instead of just browse in to the dataset.
Data never sleeps
LukeBurvill
Fluorite | Level 6

Thank you for the suggestions. However, the program itself is running as expected. It is LSF that seems to be failing in this case. I might have a chat with the team that this program is for and suggest modifying the file lock wait loop so it gives up prior to the next scheduled execution. Everything seems to point to the delayed execution of the future scheduled runs as the cause of the failing schedule down the line.

SASKiwi
PROC Star

Having your scheduled jobs overlap in time doesn't make sense to me since they are doing the same thing. So setting a time limit so one job stops before the next starts makes a lot of sense.

 

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1821 views
  • 3 likes
  • 4 in conversation