SAS expertise delivered to your desktop -- on-demand and free!

Join Now

Preparing Data in Hadoop for Analysis and Reporting

by SAS Employee JohnnyS on ‎07-25-2017 12:57 PM - edited Monday by Community Manager (2,962 Views)

If you missed the Ask the Expert session on Preparing Data in Hadoop for Analysis and Reporting, then you can still view

it on-demand at any time.


Watch the webinar


This session reviews how business users, data analysts and scientists can manage, manipulate and cleanse data stored in Hadoop using an intuitive browser interface without any specialized coding skills.


You’ll learn how to:

  • Load data into Hadoop.
  • Profile and cleanse data inconsistencies.
  • Integrate data with other sources. 
  • Deliver data for analytical analysis and reporting.


Here is a transcript of the Q&A segment held at the end of the session for ease of reference:

How is data preparation for analytics different than traditional data preparation methodologies?


Depending on the analytical method being used, you may need to transpose data into a one-row-per subject table or join data into a one-to-many table.


Does SAS® Data Loader for Hadoop leverage the Hadoop Spark in-memory framework?

Yes, users can run data cleansing and transformation processes in Spark via Data Loader.


What programming skills are required using SAS ®Data Loader for Hadoop to prepare data for analytics and reporting?


None, Data Loader is a user friendly wizard driven application that requires no coding skills.


Can I leverage exist SAS code and HiveQL code using SAS Data Loader for Hadoop?


Yes, using the Run a SAS program or Run a Hadoop SQL directive.


Can I load Hadoop data into the SAS® LASR Server to drive SAS® Visual Analytics and Statistics?


Absolutely. You can load data, in parallel, directly from Hadoop into SAS LASR.


Can I load relational database tables into HDFS using Hadoop SQOOP with SAS® Data Loader for Hadoop?


Yes, the Copy Data to and Copy Data From directives will leverage any database configured in SAS Metadata with a JDBC connection.


Can other SAS solutions, like SAS® Data Integration Studio, leverage the SAS® Data Loader for Hadoop directives in its ETL flows?


Yes. Directives can be saved into SAS folders in metadata and leveraged in SAS Data Integration Studio via several directive transformations.


Do you a free version that we can use for learning?


No, there is no free version available that I am aware of. Contact your SAS account representative to see if they might be able to setup something up for you.


How to do update/insert in hadoop, when new data come in? Not recreate the whole table.


Hive 0.14 supports update/insert, but you would need to write custom code and use the Run a SAS program directive or Run a Hadoop SQL program.



What is the difference between SAS/ACCESS to Hadoop and SAS data loader to Hadoop?


Data Loader leverages the capabilities of the SAS/ACCESS to Hadoop solution, but does not require any SAS coding.


How much the SAS Data Load will cost?


I do not know. Contact your SAS Account Rep for pricing.


Can we use dataload for AWS ?


Yes, Data Loader can be installed in AWS, just like SAS 9.4M4.



Recommended Resources
Course: Introduction to SAS and Hadoop
Course: Working with SAS Data Loader for Hadoop
Course: Hadoop Data Management with Hive, Pig, and SAS


Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.


NOTE: For best results when opening the attached slides, click on the “download” icon.