I have several jobs setup that run nightly which loads data into HDFS (another set of jobs follow that will load from HDFS to LASR). Everything works fine for me...and if something occassionally fails, I can re-run the jobs without any issues. The problem comes in when my back-up admin needs to re-run one of these jobs. When she attempts to re-run one of the HDFS load jobs (due to data changes, we re-load the entire table), she receives the following error:
ERROR:oursasserver.name (xxx.xxx.xxx.xxx)
setting: user=usernamehere, inode=tablename.sashdat
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkStickyBit(FSPermissionChecker.java:366)
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:173)
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5497)
ERROR: The HDFS concatenation of the file blocks failed.
ERROR: An error occurred during the transmission of data to file '/vapublic/tablename' on the namenode.
I know why she receivs this error...it is due to the fact that she has "read only" permissions to the file in /vapublic. The permissions show up as follows:
Permission Owner Group ...Name
-rw-r--r-- rgreen33 supergroup tablename
So, since I am the owner and I am the only one with write access, the job fails when it attempts to delete/recreate the file.
So, what is the proper way to get around this? I have thought of the following...but, not sure which is the correct way.
- Add all SAS admins to the supergroup and elevate the permission of the supergroup for the files in the /vapublic directory.
- Create a SAS account (share credentials with other SAS admins) and setup all jobs to run as this "special" account.
Ideas/suggestions?
Thanks,
Ricky