We’re smarter together. Learn from this collection of community knowledge and add your expertise.

NEW Directives for SAS Data Loader 2.4 for Hadoop

by SAS Employee SusanJ516_sas on ‎02-16-2016 10:41 AM (183 Views)

 

SAS Data Loader 2.4 includes new directives to help you work more efficiently:

  • Chain directives- runs two or more saved directives in series or in parallel. One chain directive can contain another chain directive. A serial chain can execute a parallel chain, and a parallel chain can execute a serial chain. An individual directive can appear more than once in a serial directive. Results can be viewed for each directive in a chain as soon as those results become available.
  • Cluster-Survive Data directive - uses rules to create clusters of similar rows. Additional rules can be used to construct a survivor row that replaces the cluster of rows in the target. The survivor row combines the best values in the cluster. This directive requires the Apache Spark run-time environment.
  • Match-Merge Data directive - combines columns from multiple source tables into a single target table. You can also merge data in specified columns when rows match in two or more source tables. Columns can match across numeric or character data types.
  • Run a Hadoop SQL Program directive - replaces the former directive Run a Hive Program. The new directive can use either Impala SQL or HiveQL. In the directive, a Resources box lists the functions that are available in the selected SQL environment. Selecting a function displays syntax help. A click moves the selected function into the SQL program. 

    See the SAS Data Loader 2.4 for Hadoop: User's Guide for additional information and examples.
Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.