As part of the SAS Data Management for Hadoop articles series, I’d like to explore the ins and outs of the SAS Scalable Performance Data (SPD) Engine in this post. The SPD Engine is delivered to SAS customers as part of Base SAS. It’s designed to read data very rapidly and in parallel.
In the third maintenance release for SAS 9.4, the SPD Engine expands the supported Hadoop distributions, with or without Kerberos:
The SPD Engine organizes data into a file format that has advantages for a distributed file system like the Hadoop Distributed File System (HDFS). Advantages of the SPD Engine file format include the following:
The default partition size is 128 megabytes. You can specify a different partition size with the PARTSIZE= LIBNAME statement option or the PARTSIZE= data set option.
The SPD Engine reads, writes, and updates data in HDFS. You can use the SPD Engine with standard SAS applications to retrieve data for analysis, perform administrative functions, and update the data (Note: SAS/CONNECT and SAS/SHARE are not supported by SPD Engine).
Like SAS data sets, SPD Engine tables support analytical base tables containing hundreds of thousands of columns. These analytical base tables become source tables to predictive analytical routines.
Stay tuned, in my next post we explore how to create SPD Engine tables on HDFS.
Follow the Data Management section of the SAS Communities Library (Click Subscribe in the pink-shaded bar of the section) for more articles on how SAS Data Management works with Hadoop. Here are links to other posts in the series for reference: