BookmarkSubscribeRSS Feed

How authentication works in Hadoop with SAS Data Loader

Started ‎02-25-2015 by
Modified ‎10-05-2015 by
Views 1,057

A topic of possible confusion is how authentication works in an Hadoop system.  Recently we had a question about which user to enter in the SAS Data Loader configuration screen, and in general how Hadoop handles users, so I thought I would share a bit about this topic.

 

First, there is a good article in the Cloudera documentation on this topic: Authorization and Authentication In Hadoop | Cloudera Engineering Blog .  To summarize, Authentication is the process of determining whether someone is who they claim to be.  Out of the box, if Hadoop is configured with all of its defaults, Hadoop doesn’t do any authentication of users. Hadoop has the ability to require authentication, in the form of Kerberos principals. Kerberos is an authentication protocol which uses “tickets” to support authentication.

 

SAS Data Loader supports both modes of authentication.  The Kerberos mode is significantly more complicated, so I will not go  into that in this article; I will save that topic for another time.  

 

For non-Kerberos mode, Data Loader expects that the user provided in the configuration screen is one that exists on the cluster and has at least the following permissions:

 

1.  Read/write/delete files in the HDFS directory (used for Oozie jobs)

2.  Read/write/delete tables in Hive

 

Why are these permissions needed?  Here is the explanation for each of them. 

 

The first permission is required for the directives, Copy Data To and Copy Data From Hadoop to work.  The way Copy Data to Hadoop works is to call Oozie to actually run the job on the Hadoop cluster using Sqoop.  To do this, it creates a temporary directory in the HDFS, uploads some files to this directory, and then starts an Oozie job.  It cleans all this up after the run.   So the user who is configured in Data Loader has to be able to have enough permissions to do these steps.

 

Hive permissions are needed because SAS Data Loader will perform drop, recreate, and append actions on data when working with directives.  The user has to have enough permissions to support these actions. 

 

Hope this helps clarifies how permissions are needed for the user on the configuration screen. 

Version history
Last update:
‎10-05-2015 03:44 PM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags