We’re smarter together. Learn from this collection of community knowledge and add your expertise.

What’s new in SAS Data Loader for Hadoop 2.4

by SAS Employee KenBeutler on ‎01-11-2016 09:40 AM - edited on ‎01-15-2016 11:09 AM by Community Manager (973 Views)

SAS Data Loader for Hadoop 2.4, generally available on Monday, January 11, 2016, includes features that seek to achieve three goals:

 

  1. Speed up data management processes with Spark
  2. Improve productivity of data management professionals 
  3. Manage data where it lives

Below is a summary of what’s new in the 2.4 release. For more details, please see the SAS Data Loader for Hadoop 2.4 User’s Guide
 

Speed up data management processes with Spark

  • Improved performance using Spark and ImpalaNew support for Spark brings massively parallel in-memory processing to the following directives: Cleanse Data, Transform Data and Cluster-Survive. Impala can now be leveraged in the following directives: Query or Join, Sort and De-Duplicate and Run a Hadoop SQL Program (formerly called “Run a Hive Program”).
  • Increased performance of profiling jobs

 

Improve productivity of data management professionals

  • Improved syntax editing
  • Chain directivesCreate a data flow that uses two or more saved directives which can be executed in serial or in parallel.
  • New “Match-Merge” directiveUse the new “Match-Merge” directive to append columns from multiple source tables into a single target table. Column data values can also be updated when rows match in two or more source tables.
  • New “Cluster-Survive” directiveThe new “Cluster-Survive” directive leverages user-defined rules to create clusters of similar records. Additional user-defined rules can be created to construct a survivor record that will replace the cluster of rows in the target table.
  • New “Delete Rows” directive

 

Manage data where it lives

  • Added support for IBM BigInsights and Pivotal HD
  • Expanded support to now include VirtualBox and VMWare Hypervisors
  • Schedule jobs using a REST APIA REST API can now be used to schedule and execute saved directives. The API can also return the job’s state, results, log file or error messages, along with being able to cancel running jobs and delete job information.
  • Apply and reload Hadoop configuration changes

 

New trial version

Download a free trial version of SAS Data Loader for Hadoop, to be installed on a production Hadoop cluster. This can be converted into a production license without reinstalling the software.

 

Comments
by SAS Employee Jakub_Chovanec
on ‎02-01-2016 07:23 AM

Hi,

 

when I download DL trial, it is the old one - DL 2.2. I am doing something wrong? Is it possible to make things work with downloading DL 2.4 vApp Cloudera and Cloudera Quickstart VM 5.3?

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.