BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_Dan_
Quartz | Level 8

Morning all,

 

Our formal approach to loading data into our platform is into HDFS first and then into LASR so that LASR uses its memory block mapping for more efficient memory usage.

 

There's a process that currently takes a two-year snapshot from Hadoop, ingests that to HDFS and links it to LASR. This is not efficient, and I would prefer that instead a rolling two-year snapshot is generated by appending a new day and removing the "2 years +1 day" data.

 

However, to do that, you would need to append the data into LASR and push it back into HDFS, therefore removing the efficiency of the memory block mapping. My concern is with loading the data into LASR first, the entire contents of the table is loaded into memory. Pushing it back into HDFS merely increases resilience, but we're no longer benefiting from memory block mapping.

 

In your personal opinions, would you rather suffer from a longer ETL due to a drop and reload of a couple of years' transaction data, but gain efficient memory usage in LASR -or- a quicker ETL at the cost of a potentially significant memory usage in LASR?

 

Or, have I missed a trick and there's still a way to achieve a miminal LASR footprint whilst also gaining a quicker overall ETL?

 

 

 

Dan

1 ACCEPTED SOLUTION

Accepted Solutions
_Dan_
Quartz | Level 8

In case anyone needs the answer, I spoke with SAS and they confirmed my suspicions.

 

Appending into LASR will dump 100% of the table memory into LASR.

 

To achieve maximum memory efficiency, the table should then be pushed back into HDFS, and reloaded into LASR.

 

For maximum ETL efficiency, it depends on how long a full drop and recreate takes compared to the LASR append & load into HDFS.

View solution in original post

1 REPLY 1
_Dan_
Quartz | Level 8

In case anyone needs the answer, I spoke with SAS and they confirmed my suspicions.

 

Appending into LASR will dump 100% of the table memory into LASR.

 

To achieve maximum memory efficiency, the table should then be pushed back into HDFS, and reloaded into LASR.

 

For maximum ETL efficiency, it depends on how long a full drop and recreate takes compared to the LASR append & load into HDFS.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 497 views
  • 0 likes
  • 1 in conversation