BookmarkSubscribeRSS Feed

How to leverage the Hadoop Distributed File System using the SAS Scalable Performance Data Server

Started ‎11-20-2015 by
Modified ‎01-19-2016 by
Views 2,154

In the latest installment in the SAS Data Management for Hadoop article series, I’ll explain how to leverage Hadoop using the SAS Scalable Performance Data (SPD) Server. The SPD Server is a data format that supports the creation of analytical base tables with hundreds of thousands of columns. These analytical base tables are used to support daily predictive analytical routines. Traditionally, Storage Area Network (SAN) storage has been (and continues to be) the primary storage platform for the SAS® Scalable Performance Data Server format. Due to cost constraints associated with SAN storage, companies have added their environments Hadoop to help minimize storage.

 

In the 5.2 release for the SAS® Scalable Performance Data Server, support for the Hadoop Distributed File System (HDFS) was added. Here are the supported Hadoop distributions, with or without Kerberos:

 

  • Cloudera CDH2

The SPD Server organizes data into a file format that has advantages for a distributed file system like HDFS. Advantages of the SPD Server file format include the following:

  • Data is separate from the metadata. The file format consists of separate files: one for data, one for metadata, and two for indexes. Each type of file has an identifying file extension. The extensions are .dpf for data, .mdf for metadata, and .hbx and .idx for indexes.

 SPDE.Blog2.image1.png 

  • SPD Server Cluster tables allow one or more identical structured SPD Server tables to be referenced as one logical table. The table(s) in the cluster table are called members. You can swap, add, or delete members without holding a lock on the cluster table. This provides flexibility in the maintenance of SPD Server cluster tables with the primary benefit being no downtime to end-users that need read access to the SPD Server cluster tables.
  • The SPD Server file format partitions the data by spreading it across multiple files based on a partition size. Each partition is stored as a separate physical file with the extension .dpf. Depending on the amount of data and the partition size, the data can consist of one or more physical files, but is referenced as one logical file.

The default partition size is 128 megabytes. You can alter the default partition size by overwriting the MINPARTSIZE parameter of the spdserver.parm file.  

 

Like SAS data sets, the SPD Server table supports analytical base tables containing hundreds of thousands of columns. These analytical base tables become source tables to predictive analytical routines.

 

SPDE.Blog2.image2.png

 

Follow the community for my next post where we explore how to create SPD Server tables on HDFS.

Here are links to other posts in the SAS Data Management for Hadoop series for reference:

 

Comments

please help me to download SAS software

Hi Miressa,

 

Thanks for your comment and visiting the community! Can you provide more detail? What version of SAS do you have? Or are you referring to SAS free software? If the latter, check out the SAS Analytics U Community that's packed with info to get started with SAS: free software, how-to guides, and connections to experts. 

 

Anna

Version history
Last update:
‎01-19-2016 04:48 PM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags