Hi,
If anyone has experience with report log error handling, in the context of daily automated reports, i would appreciate your feedback.
Goal:
-efficient error handling mechanism requiring a minimum of user intervention.
From my online search, here is an option:
-run a daily program that reads each job logs. The program reads all the logs for key words, such as Error, which it then compiles in a report. I could then run a program that if error is present, to email me an alert..
here is document on this i found useful: http://analytics.ncsu.edu/sesug/2008/CC-037.pdf and
http://www.lexjansen.com/pharmasug/2008/cc/CC02.pdf
thanks,
Those are good examples of searching generic SAS logs for text that indicate problems.
But if you have daily reports you might want to enhance that by adding logic into the reports to trap and report errors that you can anticipate. For example if a file that it needs to run is not available your report program should report that in the log.
In the past I have found it useful to use a prefix on lines written to the log that document both errors/issues but also document milestones in the process. So you might prefix errors with one string, warinings with another and informational/note items with a third.
My favorite log scanning paper is: http://www.lexjansen.com/nesug/nesug01/cc/cc4008.pdf.
The key insight I like from that paper is that with regard to processing NOTE messages: "it is safer to specify a list of messages to exclude from the report than it is to specify a list of messages that are the only ones to be reported."
My current approach for handling log scanning of daily reports:
This has been working well. If all goes well, every day I get one email saying N jobs ran fine. If something fails, I get an email from each failed job as well as the summary.
The only catch has been it's all dependent on the morning summary job running. If that job fails, or if the server where my jobs run is completely hung, I won't get any notification. And have to rely on my dumb mind to notice the absence of the daily email. I suppose a way to avoid that would be to have a non-SAS tool check the joblog dataset, but I haven't bothered with that.
HTH
Our solution looks completely different.
All jobs are run by the scheduler, which sets the "OK" condition only for jobs that exit with RC=0. Any other return code triggers an alert, upon wich the datacenter people react as written in the documentation (alert the responsible person immediately, on the following day, or on the following workday). Chains of jobs will only continue when all necessary predecessor jobs have finished successfully.
The SAS programs will run without WARNINGs and ERRORs when successful, the logs will be as clean as possible (no type conversion NOTEs etc). On top of that, the shell script that interfaces between the scheduler and SAS will scan the log for certain character/word sequences to find possible aberrations, and issue custom return codes if something is found. Since all logs are individually named and kept, research in case of unexpected problems is made quite easy.
So you see, we are much more proactive than just a periodic text scan for ERRORs.
We wouldn't be able to keep track of 1000+ jobs running in batch with just two SAS developers, otherwise.
In case SAS returns with 1 or 2 (WARNING or ERROR), or any other non-zero, that's it. If it returns 0, I scan the log file with an extended grep that searches for phrases I want to catch. Some will cause an automated rerun (with a limiting counter, of course), others will set a custom return code.
The SAS code itself has macros that will react to certain conditions (missing but required parameter, bad settings for remote input files, ...), and other checks (ie too small number of records in infile), which lead to an abort abend with custom return codes. Some of these codes enable the datacenter operators to correct mishaps on their own.
It is all depending on the size of your SAS operations, and the infrastructure already in place. Since we already had Control-M for all our mainframe ops, and the necessary people running it, we "only" had to build the necessary interface (scripts and logical definitions, ssh connection) to run UNIX programs from mainframe JCL scripts.
Before integrating the UNIX data warehouse into the MF job control, I used a combination of cron entries and makefiles to build the dependencies.
Everyone,
thank you very much for all of your feedback
I'm very much with @Kurt_Bremser that using a scheduler and have the scheduler take actions based on return codes is the "best" way of implementing job control. As the scheduler executes as the parent process it also will deal with cases where the SAS program runs into such a bad situation that it wouldn't send an email anymore.
It's a shame that we can't configure SAS return codes on a granular level and that there is often a need to post-process SAS logs to capture NOTE messages which we consider should be a Warning or even Error.
What you could do is implement such SAS log post processing in a single place and if the is a match have this code throw a Warning or Error. If you then call this code via the TERMSTMT option in your batch command then you could still capture the return code via scheduler and though have a simple and single approach to batch job alerting for all your scheduled jobs.
...and: I you take such an approach where your SAS log analysis feeds the return code directly back to the scheduler then you can also implement job dependencies like only running dependent jobs if the previous one didn't end with a return code of 0 (and you can set a different return code also already via SAS log post processing).
I agree, it's a shame that we can't make SAS return codes more sensitive.
But living with that, since we agree that it is often necessary to resort to log scanning to catch bad notes, then I don't really see the benefit of checking return codes at the end of a job AND doing log scanning. Are there situations where a "bad" return code is set which don't throw bad log messages that would be caught by a log scanner?
Suppose a log scanner creates macro variables with the count of bad NOTEs, WARNINGS, and ERRORS, or just a single macro variable with the sum of those counts. Maybe that's the best return code.
I don't see a benefit to having the scheduler send an email when a job ends with errors instead of the SAS session. Unless the scheduler can be configured to send an email if a job doesn't complete within a certain time frame. So that if the SAS session was hung, the scheduler could send an email, or kill it and resubmit.
I'm certainly not arguing against schedulers. Our BI server uses LSF, and it's configured to send emails.
--Q.
Are there situations where a "bad" return code is set which don't throw bad log messages that would be caught by a log scanner?
There are situation where you only get a NOTE for something where you'd like SAS to set a return code of Warning or Error.
If you post process the SAS log via TERMSTMT then you can throw a return code other than 0 directly as part of your job which allows you to implement job dependencies based on job return code (i.e. only run job 2 if job 1 ended with a return code of 0).
Thanks, I agree with:
There are situation where you only get a NOTE for something where you'd like SAS to set a return code of Warning or Error.
I often use the undocumented dsoptions=note2err just to force some (almsot all?) bad notes into errors
Would you agree with:
There is no situation in whch checking a return code can detect a problem which could not be detected by checking the log.
?
If so, then it seems to me that checking the log may be the best way to generate a return code. Rather than use the mix of SAS provided return codes (&syserr, &syscc, &SQLrc, etc etc), several with indiosyncracies.
And yes, agree that a job return code is useful when you want to have scheduling logic that depends on job status. My main point is that creating you're own job return code from log scanning seems safer to me that relying on automatic job return codes.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.