BookmarkSubscribeRSS Feed

You can do what with SAS Data Loader for Hadoop?

Started ‎06-11-2015 by
Modified ‎10-05-2015 by
Views 2,986

You may be surprised to learn that SAS Data Loader for Hadoop does a lot more than just simply load data into Hadoop. Download the free trial, spend some time getting familiar with it, and be sure to visit the SAS Data Loader for Hadoop community for additional information.


Several new features described below will become available when the new version of SAS Data Loader is released late next month. Though those features won’t be immediately available in the free trial, you can contact your SAS account representative to arrange for an evaluation of the new capabilities.

 

Read on to learn how to work with your data in SAS Data Loader for Hadoop.

 

Better understand it.

You’re ready to start a new project and you don’t know what data resides in your Hadoop environment? No problem – SAS Data Loader provides features that let you see what’s available and its overall fitness for purpose. For example, you can:

  • Browse tables and data
  • Generate and view data quality profile reports that provide metrics such as:
    • Uniqueness
    • Pattern counts
    • Null/blank counts
    • Maximum/minimum value
    • Maximum/minimum length
    • Mean
    • Median
    • Standard Deviation
    • Standard Error
    • Inferred data type
    • Frequency/pattern distribution

 

Move it.

If you need to combine data from several sources into one data set and move it to a new location, you have a complete set of tools to do the job.

  • Join on two or more Hadoop tables using inner, left, right or full joins
  • Copy database tables to and from Hadoop
  • Copy a SAS data set to and from Hadoop
  • Load data to the SAS LASR Analytic Server
  • Import a delimited file into Hadoop

 

Shape it.

SAS Data Loader has a full set of capabilities to help you shape your data into proper form in Hadoop.

  • Use queries to group rows based on the values in one or more columns and then summarize selected numeric columns
  • Filter data rows using business rules or Hive expressions
  • Sort data in Hadoop tables using one or more columns
  • Summarize data using aggregation functions such as Sum, Count, Variance, Covariance, and more
  • Transpose data with options to select one or more transpose columns, group by columns, ID columns, and copy columns
  • Delete rows of data in Hadoop tables using business rules or Hive expressions

 

Code it.

If you need to get more sophisticated with your data manipulation, there are several ways to add custom code to your project.

  • Run a Hive program using an expression builder or copy in your code
  • Run SAS programs (with DS2 content) inside Hadoop
  • Advanced users can edit code generated by SAS Data Loader

 

Clean it.

Data quality is often a big issue in any data integration project. If you need to clean data as you move it into Hadoop, you have a number of advanced algorithms at your disposal.

  • Case changing
  • Field extraction
  • Gender analysis
  • Match code generation (supports “fuzzy” matching)
  • Identification analysis
  • Parsing
  • Pattern analysis
  • Standardization

 

That’s a lot to explore! What’s at the top of your list for features that should be considered for future releases of SAS Data Loader for Hadoop? If you have already taken a look at Data Loader for Hadoop, what did you think? Please share your thoughts here on the Data Management Community.

Version history
Last update:
‎10-05-2015 03:10 PM
Updated by:
Contributors

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags