BookmarkSubscribeRSS Feed

What file types does the SAS In-Database Code Accelerator for Hadoop support?

Started ‎12-18-2015 by
Modified ‎01-19-2016 by
Views 2,538

To help you better understand the myriad of ways SAS works with Hadoop my SAS colleagues and I are posting a series of articles that dive into specific products and processes on the topic. This article focuses on the various file types supported by the SAS In-Database Code Accelerator for Hadoop and any associated requirements and limitations.

 

The SAS In-Database Code Accelerator for Hadoop enables you to publish and execute a DS2 thread program and a DS2 data program inside the cluster. DS2 thread programs are ideally suited for executing large transpositions, computationally complex programs, scoring models, and BY-group processing. As the SAS In-Database Code Accelerator for Hadoop has matured, its capabilities and support for various file types has also expanded.

 

As of the third maintenance release of SAS 9.4, the following file formats can be used with the SAS In-Database Code Accelerator for Hadoop.

 

  • Hive: Avro1, Delimited, ORC, Parquet1, RCFile, and Sequence
  • HDMD (Hadoop Metadata): Binary, Delimited, Sequence, and XML
  • HDFS: SPD Engine2

 

1Partitioned Avro or Parquet data is not supported as input to the SAS In-Database Code Accelerator for Hadoop.

2Only SPD Engine data sets with architectures that match the architecture of the Hadoop cluster (that is, 64-bit Solaris or Linux) execute inside the database. Otherwise, the DS2 thread and data programs execute locally on the client.

 

The availability of these file types depends on the version of Hive being used. The recommended Hive version for the SAS In-Database Code Accelerator for Hadoop is 0.13 or later.

 

The SAS In-Database Code Accelerator for Hadoop uses HCatalog to process complex, non-delimited files. This enables the SAS In-Database Code Accelerator for Hadoop to support Avro, ORC, RCFile, and Parquet file types. There are several requirements and prerequisites when using HCatalog. Complete details can be found in the Using HCatalog within the SAS Environment section of the documentation.

 

In the third maintenance release of SAS 9.4, multi-table SET statements, embedded SQL on the SET statement, and the MERGE statement are supported by the SAS In-Database Code Accelerator for Hadoop. Usage of these capabilities requires Hive tables in a supported file type listed above and Hive version 0.13 or later.

 

For complete details about the SAS In-Database Code Accelerator for Hadoop and its supported file types, please refer to the SAS 9.4 SAS In-Database Products User's Guide.

 

And don’t forget to follow the Data Management section of the SAS Communities Library (Click Subscribe in the pink-shaded bar of the section) for more articles on how SAS Data Management works with Hadoop. Here are links to other posts in the series for reference:

 

Comments

To ammend the following statement:

 

1Partitioned Avro or Parquet data is not supported as input to the SAS In-Database Code Accelerator for Hadoop.

 

 According to page 185 of the SAS In-Database Products: User's Guide, Sixth Edition, support for Avro or Parquet data is supported if you install hot fix Q90004.

 

Could there be a community for HiveQL help (used through the proc sql passthrough)? To help datastep and proc sql programmers transition?  

Hello @telligent,

 

The SAS Communities are for discussions, collaboration and Q&A for SAS products and programming.  You can search the SAS Programming and SAS Data Management communities for information on HiveQL.

 

Thank you.

 

 

Version history
Last update:
‎01-19-2016 04:32 PM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags