- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am new to the SAS world, and I am working on getting things setup in my production environment. I have Oracle data that is going into Hadoop and then on to LASR memory. Obviously, when my LASR server is restarted, the data in memory is lost. Thus, on restart, I need to reload my data from Hadoop into LASR. Can someone confirm that I have this process correct...or, suggest a better way of doing this. My process is as follows:
Job: Load data from Oracle to Hadoop
- Schedule this job to refresh Hadoop data from Oracle
Job: Load data from Hadoop to LASR
- Schedule this job as needed to refresh LASR data from Hadoop (as often as needed, based on changes in Oracle)
- Schedule this job to run after LASR start, to load data from Hadoop to LASR after LASR server has been restarted
Is my thinking correct? Any way to do this via AutoLoader? Any better suggestions?
Thanks,
Ricky
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think your process is a good one. We do something similar in the SAS IT department when we restart VA. All you need to do is to quickly lift from HDFS to LASR so that customers can see the data again after a restart. Furthermore, we like to have this process run as fast as possible so typically we diable metadata updates after a restart b/c it should be in sync already from the previous load.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think your process is a good one. We do something similar in the SAS IT department when we restart VA. All you need to do is to quickly lift from HDFS to LASR so that customers can see the data again after a restart. Furthermore, we like to have this process run as fast as possible so typically we diable metadata updates after a restart b/c it should be in sync already from the previous load.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Actually I'm facing the same problem in my SAS VA environment. Can you please assist me on the same quary?
"How do you load data from Hadoop to LASR automatically after restart of LASR server? "
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SKG,
The solution that I ended up with is not yet "automatic". Essentially, I have several "pairs" of jobs setup into flows...one job loads HDFS, the other loads LASR. Then, I have a separate flow that essentially contains all of my LASR load jobs. This job gets run manually during the startup process...once Hadoop and LASR are both up. So, my schedule manager looks similar to the following:
(Flow) _Load_LASR_ALL <----- This is the Flow that I run manually on Statup
(Job) Table1_Load_to_LASR
(Job) Table2_Load_to_LASR
(Flow) Load_Table1
(Job) Table1_Load_to_HDFS
(Job) Table1_Load_to_LASR
(Flow) Load_Table2
(Job) Table2_Load_to_HDFS
(Job) Table2_Load_to_LASR
We are in the process of getting some playbooks setup for Ansible. We hope that we can automate our startup process (including calling the flow or jobs listed above).
Hopefully this helps. If you have any questions, please let me know.
Thanks,
Ricky