11-30-2012 04:54 AM
We've encountered a seemingly bizarre problem when running a DI Studio job in LSF Platform Process Manager 7.1 Flow Manager, and would greatly appreciate any advice you can offer.
The job in question is a loop job, which runs an inner job once for each value in the control table (typically around 700-800 times). This works perfectly when running the job in DI Studio, however it fails when scheduled and triggered in LSF. In LSF the loop job only runs the inner job once, for the first value in the control table, and then ends with the following ERROR messages:
ERROR: File WORK.W67YHEIL.DATA does not exist.
ERROR: Parameter table, work.W67YHEIL, could not be opened.
ERROR: Execution terminated by an ABORT statement at line 4085 column 84.
WORK.W67YHEIL is one of the many randomly generated table names (in this case work.W67YHEIB, work.W67YHEIC, work.W67YHEID, etc.) that's different each time the loop transformation is run. When running the job in DI Studio, the equivalent of WORK.W67YHEIL.DATA doesn't cause a problem at all.
We have other almost identical loop jobs that run perfectly in LSF. The only two differences with this one is that the control table includes many more rows (700-800), and that the inner job contains a splitter transformation.
Before I go into further detail, are any of you aware of any problems or fixes related to running loop jobs in LSF? It's challenging to troubleshoot the problem when it all runs perfectly in DI Studio.
Thanks for your attention.
11-30-2012 05:09 AM
Not knowing the exact nature of LSF scheduler, but you need to ask what is the difference between running in DIS and the WS server in that environment, and how LSF calls SAS batch in that environment.
Looking at the error itself, it sounds like a locking issue. If this is the case, you need to dig deeply into the logs, and try to understand how the loops are deployed vs the access of the control table (creation - accessing), inserting some exact timestamps in the might be useful?
12-02-2012 02:57 PM
Take LSF out of the picture by running the deployed code from DIS is a sas session on the batch server....
This may help you debug the problem