04-25-2014 04:25 PM
Wondering the best practices in terms of creating and sending emails and texts notifications from LSF to users. Notifications should be sent to different users based on the jobs and their status.
Any advise would be greatly appreciated.
04-26-2014 04:39 AM
LSF is a scheduler (and load balancer) as known to bigger central IT organizations.
At that kind you have often a dedicated team just being responsible to run jobs and delivering them by a service-agreement. With that approach it is common to trigger attention to them whith some events. Mail is a technical implementation to do that. This is a different perspective of the users of LSF than the user of the business information.
There is a generic mail-address LSB_SUB_MAIL_USER in an configuration file. Tip: check configuration manual LSF
It is possible to have this mailadress changed on a per job approach (bmod) command ref. In interactive mode it is different, see LSF users guide. This is not what you see at all from SAS.
If your goal is delivery of the business information programmed in SAS I would advice to use the SAS toolings for that (eg ODS of Publish framework), not the LSF toolings.
04-28-2014 11:16 AM
I'm used to other schedulers such as Maestro, Autosys, etc. They have the full ability to notify the users about it's status, started, ended, completed (failed or success), etc. and exactly i'm looking this functionality in this tool. I've a windows environment where notification through email or text is needed.
Pl. let us know how you manage your env. in this regards and with this requirement.
04-28-2014 12:04 PM
Are you using SAS Management Console to schedule the flow via LSF?
There are options there to add email notifications with job status etc. And of course SAS itself can send emails, so developers can write their own notifications etc.
04-29-2014 03:12 AM
I did manage the scheduling using SAS-WA (predessor of SAS-DI) using LSF and needing to go into LSF commands. In that seen the full schedule functionality as with OPS TWS or others. Be aware of some IT politics. Schedulers as also the Windows scheduler (Taks Manager) is often being reserved to the "Operations department". Reason: it is their job to do scheduling.
The SMC is offering a lot for scheduling but the configuration needed to be done well including some autorization tasks,
Scheduling in SAS(R) 9.3 (Setting Deployed Flow Properties) .
The first requirement to send mail is that the technical part of mail setup (smtp25 / Imap) should work.
This sounds easy at first sight until you are needing an open relay for that being defined in a mail-service.
SAS(R) 9.4 System Options: Reference, Second Edition ( EMAILHOST= System Option) Is offering a better technical email configuration options. When you need those, you are left to coding it by developers
05-01-2014 10:18 AM
Thanks Jaap !
Sometimes jobs are running fine and successful but process manager shows 'Exit'. Log also suggests a very clean run but process manager is showing something different that's weird. Do you know why it's happening ?
05-01-2014 10:51 AM
Assuming the process has run well and being nicely captured by LSF and process manager is showing that. (No dirty technical timeouts or wrong synchronization)
The most logical explanation would be: errorlevel syscc
The scheduler will by default set everything having non-zero as an error (indication "exit") . The warnings (value=1) sometimes just notes often are very acceptable as a clean run.
From long time ago I remember it was something to set as acceptable value for the scheduler at job level.
05-01-2014 01:37 PM
I took quick look at one of my DI jobs.
it has a macro %RCSET which sets a return code stored in &job_rc.
At the end is a macro %ETLS_jobRCCHK which checks &job_rc.
Looks like &job_rc could could be set from a number of macro vars.
I would check &syserr &sqlrc &job_rc &trans_rc .
Probably just do %put _global_ and look through the long list of macro vars for other stuff tha could be a return code.
That's all on the SAS side. I'm assuming the job status which LSF reports is based on one of these macro vars, but don't know for sure.
05-01-2014 01:54 PM
sorry for mentioning the wrong var: SAS(R) 9.3 Macro Language: Reference as syscc is the one returning to the calling OS.
There are a lot of others many dedicated to a function or procedure. Can be very confusing.
05-08-2014 01:19 PM
LSF is a load balancer it will manage to fill the machine (cpu memory IO) and prioritizing them.
It can be even used in a grid using multiple machines. Your question is to be translated as how good hardware you can afford an what is the skills of your IT staff. The number of job at one moment is moving target with that.
05-09-2014 01:42 AM
It is not that easy is setting the number of jobs. Load balancing is about the optimizing of hardware resources.
For the cgroup (WLM) you can find that at: Chapter 1. Introduction to Control Groups (Cgroups)
This option is part of the OS level in SAS-VA docs a reference had been made to this.
For LSF there are a lot of config-files in a shared location (balancing in a grid). SAS has: Scalability Community: Platform Suite for SAS listed all docs
go into the admin pdf guide of LSF and look at "About Platform LSF resources".
The goal is making it that easy for usage, the users are not aware of what is happening with this, just getting optimal service.
Very frustrating as being not aware, not noticed, no visiability. This is a more management issue to solve first, when not, let it go.
You better can have big performance problems and by faking some support and by that getting rewards.
05-13-2014 09:08 AM
One more question for you, we see lots of exit codes in the process manager under different job names, we would like to either clear them or keep them only for a week. They are just piling up ? How can we clear them up ?
05-13-2014 10:32 AM
The context of a "job" in LSF is little bit different as in SAS. Every registrated to be executed process in LSF get a slot with a number.
That LSF number will be the unique key of that job. This number must not be confused with a PID or other number.
There a two kind of Jobs within LSF those that are once-time run and those that are being scheduled in a regular scheme.
As you are seeing a growing number of different jobnames is indicating you are processing once-run processes.
LSF is trying to capture the job-behavior (next run for regular schemes), there must be a database for that.
I can find a lsb.events file (configuration manual) that describes that kind of info or switching. kept and for how l
some once-run jobs should be remove from LSF but sometimes that does not happen.
The reason can be that as a result of an error a rerun in LSF is expected. That will keep that job active in LSF until you remove it yourself.
What is kept and for how long (1-2 days I have seen with LS 4.2) in memory I have never found.
I guess that the process manager is showing you that kind of info.