In the latest installment in the SAS Data Management for Hadoop article series, I’ll explain how to leverage Hadoop using the SAS Scalable Performance Data (SPD) Server. The SPD Server is a data format that supports the creation of analytical base tables with hundreds of thousands of columns. These analytical base tables are used to support daily predictive analytical routines. Traditionally, Storage Area Network (SAN) storage has been (and continues to be) the primary storage platform for the SAS® Scalable Performance Data Server format. Due to cost constraints associated with SAN storage, companies have added their environments Hadoop to help minimize storage.
In the 5.2 release for the SAS® Scalable Performance Data Server, support for the Hadoop Distributed File System (HDFS) was added. Here are the supported Hadoop distributions, with or without Kerberos:
The SPD Server organizes data into a file format that has advantages for a distributed file system like HDFS. Advantages of the SPD Server file format include the following:
The default partition size is 128 megabytes. You can alter the default partition size by overwriting the MINPARTSIZE parameter of the spdserver.parm file.
Like SAS data sets, the SPD Server table supports analytical base tables containing hundreds of thousands of columns. These analytical base tables become source tables to predictive analytical routines.
Follow the community for my next post where we explore how to create SPD Server tables on HDFS.
Here are links to other posts in the SAS Data Management for Hadoop series for reference: