10-17-2014 12:51 AM
Hello everyone. As I am new to SAS, therefore I’m not sure if SAS able to produce a specific information to external file.
The client that I work with requested me to produce a log file of each DI Job which contain
<Job name>,<start time>,<end time>,<job status>,<error code>,<error description>
The other requirement is to produce log file at specific folder and with specific file name format e.g.: LOG_<DDMMMYYYY>.txt
So far what I’ve done is; I add in below code in every transformation item that I use in my job
%let fileDate = %sysfunc(date(),DATE9.) ;
filename genfile "\\Exception\Log_&fileDate..txt"; /* Output Text File */
data _null_ ; /* No SAS data set is created */
FILE genfile mod;
PUT "1_SIT_FL_FACT_GPA,&etls_stepStartTime,& etls_endTime ,&syserr,&syscc,";
is there any other way to do this?? appreciate if anyone can help me on this... thanks in advanced.
10-17-2014 02:53 AM
With DI you are into the BI/DI server environment.
These kind of questions are coming from regulations requirements (auditable/traceable). A lot of this is an extension to SIEM. SIEM is not in place with SAS is should be covered at the OS level.
Logging of events is better in place as there is a logging framework based on arm log4j. The most easy way of implementing is using the APM tool (Audit and performance measurement).
Of course you can always build your own one (on premise solution).
10-17-2014 07:57 AM
Don't use user written code for these requirements. That's really bad.
Never ever have hard coded root paths in your code (like: filename genfile "\\Exception\). That's also really bad.
Once DI jobs have been deployed they are run in batch. Normally a scheduler is used for this batch processing. Bigger organisations have such things already implemented - inclusive job monitoring - so you don't need to re-invent the wheel here.
The common naming convention for scheduled DI job logs is: <DI job name>_<start datetime>.log. The name for this log is set as part of the batch command used by the scheduler. The dynamic datetime part can be defined as part of the batch command.
You wouldn't add the job end time to the SAS log name as this log file gets created at the beginning of the batch process. The same is true for job return codes. You can get the run times and job return codes as part of job monitoring (schedulers maintain normally such data) or it can get derived from the SAS log - you know when the job started and the SAS log tells you how long it took to run. If you turn on option FULLSTIMER then the log will give you even more information.
Additionally with DI Studio you can also use Status Handling SAS(R) Data Integration Studio 4.9: User's Guide and ARM logging SAS(R) 9.4 Intelligence Platform: System Administration Guide, Third Edition
What your customer asks you for is part of a general environment set-up & DI design. I feel that if you're new to SAS & SAS DI then you will need support from someone a bit more senior or you'll end up in a mess and will deliver a bad product.
10-17-2014 09:56 AM
If I understood it correctly, your customer wants to get a consildated daily Di jobs run status report. Correct?
If in case you dont have seniors around to guide you[which was the case for me 8 years back], as Patrick mentioned, explore "Return Code Check" and "File Writer" transformations instead of user written code. I hope you will just need them in order to produce this report.
10-17-2014 10:20 AM
I solved this requirement in my company includying relevant information in the DI logs using the standard precode and postcode sections of the DI jobs and transformations. Information like record counts, start and end times of the jobs, and inconsistencies found.
The lines included in the log using put or %put statements should have a common tag so I can later select those lines and create the Summary Log with only the information needed.
The job that creates the Summary Log was created using User Written code transformation. This job reads almost 40 DI logs weekly using the pipe function of the filename global statement, select the lines included as well as SAS Warnings and Errors, creates the summary report in .pdf format and sends me an email with the report attached and few statistics.
10-17-2014 10:53 AM
Best Practice for DI implementations is to use as little user written code as possible.
I believe there are multiple ways to cover these job monitoring and reporting requirements without user written code. Besides to what I've already posted there is also the option to use PROC SCAPROC to prepare a SAS log for easier reporting.
But in the end: If a SAS job terminates "ugly" or needs to be killed because it "hangs" then there won't be a good SAS log at all. That's why such job monitoring and reporting needs to be done "from outside" on OS level. Something scheduling software like LSF provides.
For full reporting on job status and performance a combination of scheduling data and SAS log data will be required. PROC SCAPROC, ARM logging and the SAS Logging Facility SAS(R) 9.4 Logging: Configuration and Programming Reference, Second Edition provide options of how to implement such requirements.
10-17-2014 11:25 AM
Agree on those Patrick. Logging should be done not by own coding but on the outside (OS).
LSF logfiles would be an excellent as it does a lot more than just running monitoring and logging jobs.
ARM logging is delivered by using the APM tool as it configures the sas servers/services. It will be also mostly outside the DI-jobs. In any case no dedicated own coding is needed (ARM log4j based). that approach is covering a lot more and is available for free.
Proc Scaproc is however a little bit different. It analyzes a having run program with a delivered sas-program. It an aid to optimize hand coded programs preparing to run in parallel (grid lsf). Base SAS(R) 9.2 Procedures Guide (proc scaproc concepts).
Lsf is also providing options to send mail in case of ....
All depends with that not on analyzing the sas-log but with correctly setting the returncode (sysrc). Everything not zero is assumed to be an error.
10-17-2014 11:30 AM
I agree with you about "use as little user written code as possible" in DI Implementations and I always try to adhere to it but in this specific simple case I decided that it was much more easy and straight forward to write this program and insert the code in a user written transformation to be able to deploy the job to the LSF scheduler in a different and independent flow so even if the main flow teminates "ugly" the program is able to report that situation.
In this case I was not only interested in reporting the job status and performance but also some other relevant information for the aplication inserted by me in the log.
Finnaly, the best practice says "little" but not "none".
10-17-2014 12:18 PM
Ctorres that "none" is a good argument for your more detailed information on processed number of records. It should be part of a good validation/verification of the input. You could think on blocking further processing generating en error code for events as errors. (good programming habit)
It is not a good argument to re-invent the wheel. That is what a lot of developers like to do. (bad programming habit)