We’re smarter together. Learn from this collection of community knowledge and add your expertise.

What file types does the SAS In-Database Code Accelerator for Hadoop support?

by SAS Employee brian_kinnebrew_sas on ‎12-18-2015 11:25 AM - edited on ‎01-19-2016 04:32 PM by Community Manager (1,146 Views)

To help you better understand the myriad of ways SAS works with Hadoop my SAS colleagues and I are posting a series of articles that dive into specific products and processes on the topic. This article focuses on the various file types supported by the SAS In-Database Code Accelerator for Hadoop and any associated requirements and limitations.

 

The SAS In-Database Code Accelerator for Hadoop enables you to publish and execute a DS2 thread program and a DS2 data program inside the cluster. DS2 thread programs are ideally suited for executing large transpositions, computationally complex programs, scoring models, and BY-group processing. As the SAS In-Database Code Accelerator for Hadoop has matured, its capabilities and support for various file types has also expanded.

 

As of the third maintenance release of SAS 9.4, the following file formats can be used with the SAS In-Database Code Accelerator for Hadoop.

 

  • Hive: Avro1, Delimited, ORC, Parquet1, RCFile, and Sequence
  • HDMD (Hadoop Metadata): Binary, Delimited, Sequence, and XML
  • HDFS: SPD Engine2

 

1Partitioned Avro or Parquet data is not supported as input to the SAS In-Database Code Accelerator for Hadoop.

2Only SPD Engine data sets with architectures that match the architecture of the Hadoop cluster (that is, 64-bit Solaris or Linux) execute inside the database. Otherwise, the DS2 thread and data programs execute locally on the client.

 

The availability of these file types depends on the version of Hive being used. The recommended Hive version for the SAS In-Database Code Accelerator for Hadoop is 0.13 or later.

 

The SAS In-Database Code Accelerator for Hadoop uses HCatalog to process complex, non-delimited files. This enables the SAS In-Database Code Accelerator for Hadoop to support Avro, ORC, RCFile, and Parquet file types. There are several requirements and prerequisites when using HCatalog. Complete details can be found in the Using HCatalog within the SAS Environment section of the documentation.

 

In the third maintenance release of SAS 9.4, multi-table SET statements, embedded SQL on the SET statement, and the MERGE statement are supported by the SAS In-Database Code Accelerator for Hadoop. Usage of these capabilities requires Hive tables in a supported file type listed above and Hive version 0.13 or later.

 

For complete details about the SAS In-Database Code Accelerator for Hadoop and its supported file types, please refer to the SAS 9.4 SAS In-Database Products User's Guide.

 

And don’t forget to follow the Data Management section of the SAS Communities Library (Click Subscribe in the pink-shaded bar of the section) for more articles on how SAS Data Management works with Hadoop. Here are links to other posts in the series for reference:

 

Comments
by SAS Employee kehous
on ‎02-11-2016 03:42 PM

To ammend the following statement:

 

1Partitioned Avro or Parquet data is not supported as input to the SAS In-Database Code Accelerator for Hadoop.

 

 According to page 185 of the SAS In-Database Products: User's Guide, Sixth Edition, support for Avro or Parquet data is supported if you install hot fix Q90004.

 

Your turn
Sign In!

Want to write an article? Sign in with your profile.