Solved: Re: Load job for HDFS is throwing error when run by backup-admin

rgreen33 · Posted 06-01-2017 10:09 AM

I have several jobs setup that run nightly which loads data into HDFS (another set of jobs follow that will load from HDFS to LASR). Everything works fine for me...and if something occassionally fails, I can re-run the jobs without any issues. The problem comes in when my back-up admin needs to re-run one of these jobs. When she attempts to re-run one of the HDFS load jobs (due to data changes, we re-load the entire table), she receives the following error:

ERROR:~~oursasserver.name~~ (~~xxx.xxx.xxx.xxx~~)

setting: user=~~usernamehere~~, inode=tablename.sashdat

org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkStickyBit(FSPermissionChecker.java:366)

org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:173)

org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:5497)

ERROR: The HDFS concatenation of the file blocks failed.

ERROR: An error occurred during the transmission of data to file '/vapublic/tablename' on the namenode.

I know why she receivs this error...it is due to the fact that she has "read only" permissions to the file in /vapublic. The permissions show up as follows:

Permission Owner Group ...Name

-rw-r--r-- rgreen33 supergroup tablename

So, since I am the owner and I am the only one with write access, the job fails when it attempts to delete/recreate the file.

So, what is the proper way to get around this? I have thought of the following...but, not sure which is the correct way.

Add all SAS admins to the supergroup and elevate the permission of the supergroup for the files in the /vapublic directory.
Create a SAS account (share credentials with other SAS admins) and setup all jobs to run as this "special" account.

Ideas/suggestions?

Thanks,

Ricky

rgreen33 · Posted 06-23-2017 07:51 AM

I just wanted to follow-up on this posting with the solution that we found. Essentially, we had a few issues. The real issue was caused by the fact that our hdfs-site.xml had the following:

<name>dfs.permissions.superusergroup</name>

<value>supergroup</value>

</property>

<name>fs.permissions.umask-mode</name>

</property>

To fix the issue, we did the following:

Created a new group in HDFS (adding our hadoop admins to the group - everyone that will be creating files in hadoop).
Modified the above lines in hdfs-site.xml file to the following:

<property>
<name>dfs.permissions.superusergroup</name>
<value>newgroup_here</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>002</value>
</property>
Ran the following commands on /vapublic

./hadoop fs -chmod -t /vapublic
./hadoop fs -chmod -R 664 /vapublic
./hadoop fs -chgrp -R hadoopadmin /vapublic

Once the above steps were completed, we did a restart on hdfs and everything worked as expected.

Kudos to Blake with SAS Tech Support for helping us find/fix this issue.

Thanks,

Ricky

View solution in original post

rgreen33 · Posted 06-01-2017 10:51 AM

One additional piece of information...

If I look at the parent folder "vapublic", I see that the permissions are set as follows:

Permission Owner Group
drwxrwxrwt hdfs supergroup

So, I suppose another option would be to drop the sticky bit from the vapublic folder. But, what problems would this cause? Or, what holes could this open up?

Thanks,
Ricky

JuanS_OCS · Posted 06-06-2017 07:46 AM

Hello @rgreen33, Ricky,

i would just make it fit to your "company traditions", on the way that you better minimize the exceptions during maintenance and procedures.

Generally speaking, I would just go for the first option. PROs: you can always trace back who did what. CONs: maybe you will grant too many permissions.

Also, if you go for the second option, you will just get the same CONs.

rgreen33 · Posted 06-23-2017 07:51 AM

I just wanted to follow-up on this posting with the solution that we found. Essentially, we had a few issues. The real issue was caused by the fact that our hdfs-site.xml had the following:

<name>dfs.permissions.superusergroup</name>

<value>supergroup</value>

</property>

<name>fs.permissions.umask-mode</name>

</property>

To fix the issue, we did the following:

Created a new group in HDFS (adding our hadoop admins to the group - everyone that will be creating files in hadoop).
Modified the above lines in hdfs-site.xml file to the following:

<property>
<name>dfs.permissions.superusergroup</name>
<value>newgroup_here</value>
</property>
<property>
<name>fs.permissions.umask-mode</name>
<value>002</value>
</property>
Ran the following commands on /vapublic

./hadoop fs -chmod -t /vapublic
./hadoop fs -chmod -R 664 /vapublic
./hadoop fs -chgrp -R hadoopadmin /vapublic

Once the above steps were completed, we did a restart on hdfs and everything worked as expected.

Kudos to Blake with SAS Tech Support for helping us find/fix this issue.

Thanks,

Ricky