A topic of possible confusion is how authentication works in an Hadoop system. Recently we had a question about which user to enter in the SAS Data Loader configuration screen, and in general how Hadoop handles users, so I thought I would share a bit about this topic.
First, there is a good article in the Cloudera documentation on this topic: Authorization and Authentication In Hadoop | Cloudera Engineering Blog . To summarize, Authentication is the process of determining whether someone is who they claim to be. Out of the box, if Hadoop is configured with all of its defaults, Hadoop doesn’t do any authentication of users. Hadoop has the ability to require authentication, in the form of Kerberos principals. Kerberos is an authentication protocol which uses “tickets” to support authentication.
SAS Data Loader supports both modes of authentication. The Kerberos mode is significantly more complicated, so I will not go into that in this article; I will save that topic for another time.
For non-Kerberos mode, Data Loader expects that the user provided in the configuration screen is one that exists on the cluster and has at least the following permissions:
1. Read/write/delete files in the HDFS directory (used for Oozie jobs)
2. Read/write/delete tables in Hive
Why are these permissions needed? Here is the explanation for each of them.
The first permission is required for the directives, Copy Data To and Copy Data From Hadoop to work. The way Copy Data to Hadoop works is to call Oozie to actually run the job on the Hadoop cluster using Sqoop. To do this, it creates a temporary directory in the HDFS, uploads some files to this directory, and then starts an Oozie job. It cleans all this up after the run. So the user who is configured in Data Loader has to be able to have enough permissions to do these steps.
Hive permissions are needed because SAS Data Loader will perform drop, recreate, and append actions on data when working with directives. The user has to have enough permissions to support these actions.
Hope this helps clarifies how permissions are needed for the user on the configuration screen.