Solved: LASR append and pushing data back to HDFS - optimum memory over optimu...

_Dan_ · Posted 02-13-2019 06:41 AM

Morning all,

Our formal approach to loading data into our platform is into HDFS first and then into LASR so that LASR uses its memory block mapping for more efficient memory usage.

There's a process that currently takes a two-year snapshot from Hadoop, ingests that to HDFS and links it to LASR. This is not efficient, and I would prefer that instead a rolling two-year snapshot is generated by appending a new day and removing the "2 years +1 day" data.

However, to do that, you would need to append the data into LASR and push it back into HDFS, therefore removing the efficiency of the memory block mapping. My concern is with loading the data into LASR first, the entire contents of the table is loaded into memory. Pushing it back into HDFS merely increases resilience, but we're no longer benefiting from memory block mapping.

In your personal opinions, would you rather suffer from a longer ETL due to a drop and reload of a couple of years' transaction data, but gain efficient memory usage in LASR -or- a quicker ETL at the cost of a potentially significant memory usage in LASR?

Or, have I missed a trick and there's still a way to achieve a miminal LASR footprint whilst also gaining a quicker overall ETL?

Dan

_Dan_ · Posted 03-01-2019 05:51 AM

In case anyone needs the answer, I spoke with SAS and they confirmed my suspicions.

Appending into LASR will dump 100% of the table memory into LASR.

To achieve maximum memory efficiency, the table should then be pushed back into HDFS, and reloaded into LASR.

For maximum ETL efficiency, it depends on how long a full drop and recreate takes compared to the LASR append & load into HDFS.

View solution in original post

_Dan_ · Posted 03-01-2019 05:51 AM

In case anyone needs the answer, I spoke with SAS and they confirmed my suspicions.

Appending into LASR will dump 100% of the table memory into LASR.

To achieve maximum memory efficiency, the table should then be pushed back into HDFS, and reloaded into LASR.

For maximum ETL efficiency, it depends on how long a full drop and recreate takes compared to the LASR append & load into HDFS.

LASR append and pushing data back to HDFS - optimum memory over optimum ETL

Re: LASR append and pushing data back to HDFS - optimum memory over optimum ETL

Re: LASR append and pushing data back to HDFS - optimum memory over optimum ETL

LASR append and pushing data back to HDFS - optimum memory over optimum ETL

Re: LASR append and pushing data back to HDFS - optimum memory over optimum ETL

Re: LASR append and pushing data back to HDFS - optimum memory over optimum ETL

Ready to join fellow brilliant minds for the SAS Hackathon?