11-07-2016 02:09 PM - edited 11-07-2016 02:11 PM
Is there a simple way to restart SAS from the point where it failed?
Say we have a code like this:
%macro main_test(parameter 1, parameter 2);
The same code could run in Unix (for prod) in batch mode as well as in EG (adhoc).
I need the ability to restart this program from the point where it failed. For instance, if it fails in %test3 macro, and if the program is restarted, it should start running from %test3 macro and not start from %test1.
The only way I could think of achieveing this is maintaining a control table that has a list of macro names and update the status pre and post completion of a macro. But I have nearly 80 macros that are executed sequentially, and it would be tedious to add a insert/update statements to the control table after every macro call. Also new macros addition in future would also warrant the insert update statment to be added. On top of it I would need to design some strategy to control the control table when multiple instances of same program is run.
Are there any other ways to track the execution of SAS program and restart it exactly from the point of failure?
11-07-2016 02:32 PM
There is no built-in functionality in SAS to do this. Also what constitutes failure? Even if you construct a way of tracking return codes at each step in your jobs, I doubt it will be worth the effort and be reliable enough for production purposes. Any solution is likely to be complex.
Why not explore the possibility of speeding up your jobs using more efficient techniques, or possibly re-designing steps so they can run in parallel? Also how often are your jobs failing? I suggest you would be better off making changes to improve job reliability. If your jobs rarely fail then the need to re-run becomes unimportant.
11-07-2016 02:40 PM
The application doesn't fail quite often, but when it fails due to something (say some system issue), restarting becomes quite an issue as it is a long running process. Some times it runs for 7 or 8 hours and fails. During such times, the entire cycle needs to be cancelled as there is no bandwidth to have it completed after a restart. Each of those macros starting from %test1 takes a significant amount of time to be completed and ample time has already been invested on tuning them to the best possible way. The volume of data itself is quite big, so nothing much has been possible in reducing their run time.
Parallel processing (running the macros %test1, %test2 etc. in parallel fashion) is the part of the requirement along with incorporating the restart ability, but I have just started with this. I thought of posting parallel processing as a separate question, but if you have any thoughts related to that, please do share. But even with parallel processing, we are looking at few hours for completion of each module and adequate restart capability is a must for us to avoid skipping SLAs.
11-07-2016 04:08 PM
I suggest you explore the options available to you with scheduling tools. For example you could break your 7 - 8 hour job down into say 3 or 4 processes, then use the scheduler to only start process 2 if you get a zero return code with process 1 and so on. That way if there is a failure, your job will stop as early as possible. Then you use your scheduler to start after the last good step.
Hint: make sure your SAS batch jobs have the SYNTAXCHECK option switched on. With this option SAS sets OBS = 0 at the first error and very quckly finishes a job as it processes no data after the error.