SAS LASR Analytic Server's use of HDFS deserves a bit of explanation here... If your data is sitting in your corporate deployment of Hadoop, say Cloudera, and is used for non-LASR activities (such as Pig or Hive jobs), it isn't in a form that can be used directly by LASR. When you use SAS to move data onto the co-located HDFS that is used by LASR, the resulting data file (.SASHDAT) is very specific to what LASR needs. Once in that format, it isn't usable by other Hadoop processes, but it is very efficient to load those files into LASR because it is laid out on disk exactly the same as it is in memory. LASR uses a Linux process called memory mapping to map the locations on disk into RAM memory. The additional benefit is that if the OS needs to page that memory out, it doesn't have to write it to disk-- it's already on disk and the OS knows where. So the OS can just drop it from memory during the page swap. When it comes time to swap back into memory, the OS picks it up from the original location on disk. If you want to, you can install LASR directly on your corporate deployment of Cloudera (or Apache Hadoop or HortonWorks or BigInsights, etc.). However, even if you do, you'll need to decide if you want to create SASHDAT files in addition to keeping your data in its traditional form for use by Hadoop processes. The main benefits are listed above. The drawbacks are additional disk usage (more than double in many cases) and the need to somehow keep the two data sources in sync.
... View more