SAS Data Loader for Hadoop 2.4, generally available on Monday, January 11, 2016, includes features that seek to achieve three goals:
- Speed up data management processes with Spark
- Improve productivity of data management professionals
- Manage data where it lives
Below is a summary of what’s new in the 2.4 release. For more details, please see the SAS Data Loader for Hadoop 2.4 User’s Guide.
Speed up data management processes with Spark
- Improved performance using Spark and Impala - New support for Spark brings massively parallel in-memory processing to the following directives: Cleanse Data, Transform Data and Cluster-Survive. Impala can now be leveraged in the following directives: Query or Join, Sort and De-Duplicate and Run a Hadoop SQL Program (formerly called “Run a Hive Program”).
- Increased performance of profiling jobs
Improve productivity of data management professionals
- Improved syntax editing
- Chain directives - Create a data flow that uses two or more saved directives which can be executed in serial or in parallel.
- New “Match-Merge” directive - Use the new “Match-Merge” directive to append columns from multiple source tables into a single target table. Column data values can also be updated when rows match in two or more source tables.
- New “Cluster-Survive” directive - The new “Cluster-Survive” directive leverages user-defined rules to create clusters of similar records. Additional user-defined rules can be created to construct a survivor record that will replace the cluster of rows in the target table.
- New “Delete Rows” directive
Manage data where it lives
- Added support for IBM BigInsights and Pivotal HD
- Expanded support to now include VirtualBox and VMWare Hypervisors
- Schedule jobs using a REST API - A REST API can now be used to schedule and execute saved directives. The API can also return the job’s state, results, log file or error messages, along with being able to cancel running jobs and delete job information.
- Apply and reload Hadoop configuration changes
New trial version
Download a free trial version of SAS Data Loader for Hadoop, to be installed on a production Hadoop cluster. This can be converted into a production license without reinstalling the software.