We’re smarter together. Learn from this collection of community knowledge and add your expertise.

You can do what with SAS Data Loader for Hadoop?

by SAS Super FREQ on ‎06-11-2015 09:57 AM - edited on ‎10-05-2015 03:10 PM by Community Manager (2,137 Views)

You may be surprised to learn that SAS Data Loader for Hadoop does a lot more than just simply load data into Hadoop. Download the free trial, spend some time getting familiar with it, and be sure to visit the SAS Data Loader for Hadoop community for additional information.


Several new features described below will become available when the new version of SAS Data Loader is released late next month. Though those features won’t be immediately available in the free trial, you can contact your SAS account representative to arrange for an evaluation of the new capabilities.

 

Read on to learn how to work with your data in SAS Data Loader for Hadoop.

 

Better understand it.

You’re ready to start a new project and you don’t know what data resides in your Hadoop environment? No problem – SAS Data Loader provides features that let you see what’s available and its overall fitness for purpose. For example, you can:

  • Browse tables and data
  • Generate and view data quality profile reports that provide metrics such as:
    • Uniqueness
    • Pattern counts
    • Null/blank counts
    • Maximum/minimum value
    • Maximum/minimum length
    • Mean
    • Median
    • Standard Deviation
    • Standard Error
    • Inferred data type
    • Frequency/pattern distribution

 

Move it.

If you need to combine data from several sources into one data set and move it to a new location, you have a complete set of tools to do the job.

  • Join on two or more Hadoop tables using inner, left, right or full joins
  • Copy database tables to and from Hadoop
  • Copy a SAS data set to and from Hadoop
  • Load data to the SAS LASR Analytic Server
  • Import a delimited file into Hadoop

 

Shape it.

SAS Data Loader has a full set of capabilities to help you shape your data into proper form in Hadoop.

  • Use queries to group rows based on the values in one or more columns and then summarize selected numeric columns
  • Filter data rows using business rules or Hive expressions
  • Sort data in Hadoop tables using one or more columns
  • Summarize data using aggregation functions such as Sum, Count, Variance, Covariance, and more
  • Transpose data with options to select one or more transpose columns, group by columns, ID columns, and copy columns
  • Delete rows of data in Hadoop tables using business rules or Hive expressions

 

Code it.

If you need to get more sophisticated with your data manipulation, there are several ways to add custom code to your project.

  • Run a Hive program using an expression builder or copy in your code
  • Run SAS programs (with DS2 content) inside Hadoop
  • Advanced users can edit code generated by SAS Data Loader

 

Clean it.

Data quality is often a big issue in any data integration project. If you need to clean data as you move it into Hadoop, you have a number of advanced algorithms at your disposal.

  • Case changing
  • Field extraction
  • Gender analysis
  • Match code generation (supports “fuzzy” matching)
  • Identification analysis
  • Parsing
  • Pattern analysis
  • Standardization

 

That’s a lot to explore! What’s at the top of your list for features that should be considered for future releases of SAS Data Loader for Hadoop? If you have already taken a look at Data Loader for Hadoop, what did you think? Please share your thoughts here on the Data Management Community.

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.