BookmarkSubscribeRSS Feed

Preparing Data in Hadoop for Analysis and Reporting

Started ‎07-25-2017 by
Modified ‎11-20-2017 by
Views 11,234

If you missed the Ask the Expert session on Preparing Data in Hadoop for Analysis and Reporting, then you can still view

it on-demand at any time.

 

Watch the webinar

 

This session reviews how business users, data analysts and scientists can manage, manipulate and cleanse data stored in Hadoop using an intuitive browser interface without any specialized coding skills.

 

You’ll learn how to:

  • Load data into Hadoop.
  • Profile and cleanse data inconsistencies.
  • Integrate data with other sources. 
  • Deliver data for analytical analysis and reporting.

 dl.png

Here is a transcript of the Q&A segment held at the end of the session for ease of reference:


How is data preparation for analytics different than traditional data preparation methodologies?

 

Depending on the analytical method being used, you may need to transpose data into a one-row-per subject table or join data into a one-to-many table.

  

Does SAS® Data Loader for Hadoop leverage the Hadoop Spark in-memory framework?


Yes, users can run data cleansing and transformation processes in Spark via Data Loader.

 

What programming skills are required using SAS ®Data Loader for Hadoop to prepare data for analytics and reporting?

 

None, Data Loader is a user friendly wizard driven application that requires no coding skills.

 

Can I leverage exist SAS code and HiveQL code using SAS Data Loader for Hadoop?

 

Yes, using the Run a SAS program or Run a Hadoop SQL directive.

 

Can I load Hadoop data into the SAS® LASR Server to drive SAS® Visual Analytics and Statistics?

 

Absolutely. You can load data, in parallel, directly from Hadoop into SAS LASR.

 

Can I load relational database tables into HDFS using Hadoop SQOOP with SAS® Data Loader for Hadoop?

 

Yes, the Copy Data to and Copy Data From directives will leverage any database configured in SAS Metadata with a JDBC connection.

 

Can other SAS solutions, like SAS® Data Integration Studio, leverage the SAS® Data Loader for Hadoop directives in its ETL flows?

 

Yes. Directives can be saved into SAS folders in metadata and leveraged in SAS Data Integration Studio via several directive transformations.

 

Do you a free version that we can use for learning?

 

No, there is no free version available that I am aware of. Contact your SAS account representative to see if they might be able to setup something up for you.

 

How to do update/insert in hadoop, when new data come in? Not recreate the whole table.

 

Hive 0.14 supports update/insert, but you would need to write custom code and use the Run a SAS program directive or Run a Hadoop SQL program.

 

 

What is the difference between SAS/ACCESS to Hadoop and SAS data loader to Hadoop?

 

Data Loader leverages the capabilities of the SAS/ACCESS to Hadoop solution, but does not require any SAS coding.

 

How much the SAS Data Load will cost?

 

I do not know. Contact your SAS Account Rep for pricing.

 

Can we use dataload for AWS ?

 

Yes, Data Loader can be installed in AWS, just like SAS 9.4M4.

 

 

Recommended Resources
Course: Introduction to SAS and Hadoop
Course: Working with SAS Data Loader for Hadoop
Course: Hadoop Data Management with Hive, Pig, and SAS

 

Want more tips? Be sure to subscribe to the Ask the Expert Community Library to receive follow up Q/A, slides and recordings from other SAS Ask the Expert webinars. From Ask the Expert Library, just click Subscribe from the orange bar underneath the list of the recent articles.

 

NOTE: For best results when opening the attached slides, click on the “download” icon.

Comments

How to convert chart column to date or numeric column, when  the chart column is for example 2019-11-05 (05-NOV-2019). I have tried many of the tips on the website but they do not work. There are 38 000 rows. 

 

Juha Nyman

Hi Juha,

If you have value like 2019-11-05 as a character data type in a Hive table in Hadoop, you can use the Transform Data directive in SAS Data Loader for Hadoop. In the Manage Columns section of that directive, add a new column, assign it a Type of DATE and add the following expression:

 

to_date(inputn(datevar, 'yymmdd10.'))

 

The INPUTN function converts character values to numeric using a numeric informat. The informat of yymmdd10. will read a character date of the form you 2019-11-05 to a SAS numeric date. SAS dates are stored as numeric doubles. The to_date function is then able to convert the SAS numeric date to the ANSI DATE data type that is used in Hive tables.

Thank You very much for the advice. I managed to do it, it is very important for us.



Best regard

Juha Nyman


Version history
Last update:
‎11-20-2017 04:32 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Labels
Article Tags