02-08-2017 03:00 PM - edited 02-08-2017 04:03 PM
I (and my organization) am new to SAS and learning as I go. Our envionment is essentially a distributed enviornent, using Hadoop. Most of our data in SAS will come from Oracle via queries/jobs buit in Data Builder and/or DI Studio. Most of our data will be loaded into Hadoop and then to LASR...and then consumed by the users, using EG and VA Designer. I currently have my data flowing via scheduled jobs. The only piece that I am missing is the script for reloading my LASR data from Hadoop when the LASR servers are restarted. Based on what I have learned from classes, this community, and searching the web, this "reload" will need to be a script that is manually run during my startup process. I am assuming that many of you have done this same thing. So, I would like to ask if anyone would be willing to share the script that you are using to reload your LASR data when your servers are restarted?
One thing that I will add to the above...As I mentioned, I already have jobs setup and scheduled for loading the data into Hadoop and separate jobs/schedules for loading the data into LASR. So, what I am thinking is that maybe I can reuse the jobs that I already have setup for loading into LASR. So, I am thinking that this script that I am looking for would maybe be a script that calls these jobs...if that is possible.
One more addition to the above...this would be a script that will be run from Linux.
02-08-2017 04:05 PM
I work at SAS in IT supporting the VA environment we run internally. Welcome to the SAS community. The path you are headed is exactly like the setup we have employed. We have seperated Hadoop loads and LASR loads then combined them into scheduled flows for regularly refreshing data. The trick we crated is to take all the LASR flows/jobs and include them into one large flow "ie. Load all to LASR". That flow is triggered via command on restart. If this sounds like an option for you I can share the syntax we are using to trigger a flow in LSF Platform Scheduler. Probably could be adapted for any scheduler.
02-08-2017 04:12 PM
That is exaclty what I was just looking at...trying to find a way to see the code behind a flow...to attempt to do exactly what you are saying below. So, yes, I would be VERY interested in seeing the syntax for this. By the way, we are using the following Scheduling Server: Operating System Services
02-08-2017 04:56 PM - edited 02-08-2017 04:59 PM
EDIT: Just read you were using a different scheduler. Well this is the syntax we use for the LSF Platform scheduler. During our initialization of the LASR servers we evaluate if the servers started and if they did we kick of a .ksh which includes this info:
# Will source a batch version of PM profile
# and kick off the main flow for LASR table reload
. /<insert your path>/pm/conf/profile.js
Also sounds like we have similar methodology so here are a couple of our published papers.
Hope that helps.
02-11-2017 10:41 AM
Again, thank you for the informaion. I have been looking at the scripts and I have gotten close to a solution. I can find the code that is produced by flow. However when I run/call that code, the jobs are sent to the batch server for processing...and gets rejected due to not being the correct time for running. So, I'm guessing that I could make a copy of this code and modify the section that does the time checking (not sure). However, at this time, we are just trying to get things in place so that we can "go live" with the system. So, for now, I am going to settle for a single flow that loads all LASR tables, kicking off the flow manually via Schedule Manager during the startup process. Maybe at a later date, I will come back to this and attempt to script this. If you happen to run across a script that will do this for the scheduler that I am using, please share.
02-08-2017 04:10 PM
Have a look at this SGF paper http://support.sas.com/resources/papers/proceedings16/3660-2016.pdf
Hope it helps,
02-12-2017 11:21 AM
I think yours is more a Linux and OS admin question than a SAS question. Reason because I say this is because you probably can get the best and quickest support at your own organization. You are looking for specific and already-made scripts, and I totally understand that, but your Linux admins can help you out perfectly. Let me explain and go through the information:
- You aready ahce scripts to restart LASR. And scripts to save LASR data to Hadoop (well done). and to load all your data into LASR.
- You also mentioned you have the Operating Systems scheduler.
All of this means that, as you said, you can easily reuse those scripts and quickly create the trigger.
Here is the information you need:
- Operating System scheduler is the same than crontab or at. 2 well-known programs by any unix admin.
- They will need to know the port or ports of your LASR server, in order to create a little bucle to check that port. When the service is restarted, then they can trigger any job you have already prepared.
Is there anything else you need?