I am new to the SAS world, and I am working on getting things setup in my production environment. I have Oracle data that is going into Hadoop and then on to LASR memory. Obviously, when my LASR server is restarted, the data in memory is lost. Thus, on restart, I need to reload my data from Hadoop into LASR. Can someone confirm that I have this process correct...or, suggest a better way of doing this. My process is as follows:
Job: Load data from Oracle to Hadoop
- Schedule this job to refresh Hadoop data from Oracle
Job: Load data from Hadoop to LASR
- Schedule this job as needed to refresh LASR data from Hadoop (as often as needed, based on changes in Oracle)
- Schedule this job to run after LASR start, to load data from Hadoop to LASR after LASR server has been restarted
Is my thinking correct? Any way to do this via AutoLoader? Any better suggestions?
Thanks,
Ricky
I think your process is a good one. We do something similar in the SAS IT department when we restart VA. All you need to do is to quickly lift from HDFS to LASR so that customers can see the data again after a restart. Furthermore, we like to have this process run as fast as possible so typically we diable metadata updates after a restart b/c it should be in sync already from the previous load.
I think your process is a good one. We do something similar in the SAS IT department when we restart VA. All you need to do is to quickly lift from HDFS to LASR so that customers can see the data again after a restart. Furthermore, we like to have this process run as fast as possible so typically we diable metadata updates after a restart b/c it should be in sync already from the previous load.
Thank you!
@SKG,
The solution that I ended up with is not yet "automatic". Essentially, I have several "pairs" of jobs setup into flows...one job loads HDFS, the other loads LASR. Then, I have a separate flow that essentially contains all of my LASR load jobs. This job gets run manually during the startup process...once Hadoop and LASR are both up. So, my schedule manager looks similar to the following:
(Flow) _Load_LASR_ALL <----- This is the Flow that I run manually on Statup
(Job) Table1_Load_to_LASR
(Job) Table2_Load_to_LASR
(Flow) Load_Table1
(Job) Table1_Load_to_HDFS
(Job) Table1_Load_to_LASR
(Flow) Load_Table2
(Job) Table2_Load_to_HDFS
(Job) Table2_Load_to_LASR
We are in the process of getting some playbooks setup for Ansible. We hope that we can automate our startup process (including calling the flow or jobs listed above).
Hopefully this helps. If you have any questions, please let me know.
Thanks,
Ricky
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.