BookmarkSubscribeRSS Feed

5 tips for SAS/ACCESS TO “KERBERIZED” HADOOP

Started ‎05-19-2017 by
Modified ‎08-28-2019 by
Views 18,990

I have been contacted several times recently on this very same topic by colleagues struggling with the SAS/ACCESS to Hadoop configuration when the Hadoop cluster is secured with Kerberos authentication. There are many reasons for that: Kerberos is a quite complex system and can be configured in various ways, and the Hadoop ecosystem continues to be pretty new and a bit different (compared to traditional DBMS) also keeps evolving rapidly.


SAS/ACCESS to HADOOP configuration and validation in a Kerberized environment task is far from being an automatic or simple deployment step: it should be considered as integration work.

 

There are already a lot of excellent material in Stuart Rogers’s publications and videos on this topic. They are THE reference material...but of course you need some “offline” time to read it carefully.

 

The idea of this article is to provide you a short list of actions to perform if you are on the ground, facing the "SAS/ACCESS to HADOOP with Kerberos is not working" problem. One of these actions is also a way to quickly determine if you are in front of a real SAS/ACCESS configuration issue or of a more general System/Kerberos configuration issue.

 

kerberosFortress.png The five tips presented here are :
  1. Use PROC HADOOP to get a more detailed message
  2. Test the connection without SAS
  3. Check your Hadoop client components (Jars and configuration)
  4. Check JCE files
  5. Check your ticket
Note: tips 2) and 3) are not Kerberos specific but can apply in a Kerberos environment.

1) Use PROC HADOOP to get a more detailed message

 

The typical ERROR messages that you will see will be when you submit your HADOOP LIBNAME statement in SAS are:

ERROR: java.util.concurrent.TimeoutException ERROR: Could not open connection to jdbc:hive2://h1r1en01.bpa.CustomerA.fr:10000/default;principal=hive/_HOST@HADOOP.VOYAGER. Check your Hive server status and also set option SUBPROTOCOL= appropriately. Set SUBPROTOCOL=hive2 if you are running HiveServer2. Set SUBPROTOCOL=hive if you are running Hive1. ERROR: Error trying to establish connection.

 

Or:

ERROR: java.sql.SQLException: Could not open client transport with JDBC Uri: jdbc:hive2://psdnas3.unx.customerB.com:10000/default; principal=hive/_HOST@PSDNAS.KRB.CustomerB.COM: GSS initiate failed ERROR: Error trying to establish connection.

 

Or:

HADOOP: Connection to DSN=gbbdap20 failed. 15 1436341582 no_name 0 OBJECT_E ERROR: Unable to connect to the Hive server. ACCESS ENGINE: Exiting DBICON with rc=0X801F9007 16 1436341582 no_name 0 OBJECT_E ERROR: Error trying to establish connection.

 

It does not really help to identify the real issue, even when you enable SASTRACE options...A good way to get a more detailed message is to run a PROC HADOOP, for example:

 

filename cfg "< SASConfig>/Lev1/HadoopServer/conf/HadoopKerberosConfig.xml"; PROC HADOOP options=cfg verbose; hdfs mkdir='/tmp/hdfs_test'; run;

 

Note: Refer to SAS documentation for details on PROC HADOOP.refer to SAS documentation for details on PROC HADOOP. With PROC HADOOP, you will likely have more detail messages helping in the diagnostic.

 

Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:65) ... Caused by: KrbException: Generic error (description in e-text) (60) - Unable to locate Kerberos realm at sun.security.krb5.Config.getRealmFromDNS(Config.java:1277) ...

 

Or:

ERROR: Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] ERROR: at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:680) ...

 

Or:

ERROR: Caused by: GSSException: No valid credentials provided (Mechanism level: Invalid option setting in ticket request. (101)) ERROR: at sun.security.jgss.krb5.Krb5Context.initSecContext(Krb5Context.java:710) ...

 

As SAS/ACCESS to Hadoop processing happens in Java, we are looking for “GSS” or “KRB” Java exceptions which will help the System administrator to nail down the problem.

 

But now let’s imagine the customer is really sure the problem is coming from SAS….Well there is a way to really establish the fact that SAS has nothing do with this connectivity issue.

 

 

2) Test the connection without SAS

The best way to show the customer that the problem is not coming from SAS is to reproduce it without using SAS.

When you run a Hadoop Libname you are actually trying to open a JDBC connection to a Hive Server. When it is a Kerberized Hadoop cluster, you are trying to open this JDBC connection with a Kerberos authentication for the Hive Service.

 

If a tool like beeline is installed on the SAS machine, then it could be used to validate the JDBC connection from the SAS Server as it is very close to what is done by the SAS/ACCESS to HADOOP connection. Otherwhise, if it is not possible to use beeline (for example not available on Solaris), SAS technical support is providing a standalone Java applications to verify Hive and HDFS connectivity to a hadoop cluster.

 

Follow the link below to download the java application and associated documentation.

These tests are done outside of SAS and are a great way to validate that your Hadoop client configuration and also the Kerberos configuration without engaging the SAS Software responsibility.

 

Note: As for any other RDBMS the SAS/ACCESS to HADOOP connector relies on the RDBMS client. So if the RDBMS client cannot contact the DBMS Server, then no need event to think starting SAS... In our case the RDBMS client consists of Hadoop JAR files and Hadoop *-site.xml configuration files (core-site, hdfs-site.xml, etc...)

 

3) Check your Hadoop client components (Jars and configuration)

To be able to connect to Hadoop, SAS/ACCESS only needs to know :

  1. The location of the Hadoop jar files (via the SAS_HADOOP_JAR_PATH option)
  2. The location of the Hadoop client xml configuration files (via the SAS_HADOOP_CONFIG_PATH option)

In the recent SAS 9.4 releases, you can configure SAS/ACCESS to HADOOP using the SAS Deployment Manager (SDM). The objective of this task of the SDM is to collect the Hadoop jar files and Hadoop configuration files required by SAS/ACCESS to HADOOP.

 

In some situations it is not possible to meet the requirements to use the SDM. However if you have a shell access to the Hive node, then it stills possible to use the main script called by the SDM in console mode. It will allow to automatically collect the proper jar and *-site.xml files.

 

The script is called hadooptracer and is part of your SAS/ACCESS to Hadoop deployment (usually available in /SASHadoopConfigurationLibraries/2.1/data).

 

 

See below the instruction to run it outside of the SDM in order to collect the the proper jars and "*-site.xml" files for your specific Hadoop cluster.

 

Copy the hadooptracer.py script to /tmp on the Hive node (from the SAS Server SASHome's directory). Check that strace, wget, and python are installed on the Hive node. If not, install them.

 

[sasinst@sashdp01 tmp]$ which strace python wget /usr/bin/strace /usr/bin/python /usr/bin/wget

 

Make the hadooptracer executable:

[sasinst@sashdp01 tmp]$ chmod a+x hadooptracer.py

 

Run it as the HDFS superuser in your Hadoop environment, usually 'hdfs' or 'hadoop':

[root@sashdp01 ~]# su - hdfs [hdfs@sashdp01 ~]$ cd /tmp [hdfs@sashdp01 tmp]$ ./hadooptracer.py

 

Note: In a Kerberized Hadoop env, make sure to kinit the hdfs user (Hadoop superuser) before running hadooptracer.py

 

[hdfs@sashdp01 tmp]$ ./hadooptracer.py 2015-11-13 12:15:58,215 hadooptracer [INFO] Hadoop Tracer started 2015-11-13 12:15:58,215 hadooptracer [INFO] Temporary File Directory: /tmp/hadooptracer.FnO4MP 2015-11-13 12:15:58,216 hadooptracer [INFO] Starting parallel tracing ...

 

When complete, check if /tmp/confs and /tmp/jars folders contains the files required by SAS_HADOOP_JAR_PATH and SAS_HADOOP_CONFIG_PATH. Note: even if you see some errors in the output of the scripts, the required files might have been collected.

 

UPDATE: The hadooptracer tool is also available for download on the SAS Support FTP server : 

ftp.sas.com/techsup/download/blind/access/hadooptracer.zip

So, even before installing SAS, you can grab it use it to collect the JAR and configuration files, then run the "Hive HDFS Connectivity test tool" (discussed in the previous tip) to ensure that your client machine can connect to your Hadoop cluster and access the data.

 

4) Check JCE files

A common pitfall is the fact that by default Java is only is only able to process AES-128 encryption in Kerberos. If you have AES-256 encrypted Kerberos communications, then the Java Cryptography Extensions (JCE) are required to allow the connection. The instructions below are an extract from the Grid for Hadoop Configuration Guide but are very relevant in any SAS with “Kerberized” hadoop integration work.

 

“If your Kerberos implementation is using keys that are encrypted with 256 bit AES, Java will require updated security policy jars to work properly with Kerberos. To determine what keys your implementation is using, initialize your credential cache using kinit and then run the klist command with the “-e” option to list the encryption type. If the encryption type has “aes256” in the string, you need to update the policy files. The policy files live in /lib/security and have the names local_policy.jar and US_export_policy.jar. Oracle provides unlimited security policy jars for each version of Java, which can be downloaded from the following locations:
To determine what version of Java you are using, run the “java --version” command. Version 1.6.xxxx is Java 6, 1.7.xxxx is Java 7 and 1.8.xxxx is Java 8.
By default, SAS will use the SAS Private JRE unless told to use some other Java JRE during installation. If you are using the SAS Private JRE, the Java 7 policy files need to be placed in /SASPrivateJavaRuntimeEnvironment/9.4/jre/lib/security.”

 

 

5) Check your ticket

 

When we start the SAS session to open our Hadoop libname, SAS needs to know where it can find a Kerberos TGT (“Ticket Granting ticket”) for the user. SAS relies on an environment variable for that: KRB5CCNAME which points to the correct Kerberos Ticket Cache. Depending on your system authentication configuration (usually PAM), this variable might or might not automatically be set. The instructions below can be used to check the existence of the TGT cache and validate the HADOOP libname starting from the command line on the SAS Server :

 

  • Create a sample SAS program containing only your hadoop libname and eventually some debugging options
/*debug : check if the KRB5CCNAME variable is picked up in the SAS Session*/ %let krb5env=%sysget(KRB5CCNAME); %put &KRB5ENV; LIBNAME HIVELIB HADOOP PORT=10000 SERVER="gatehdp01.gatehadoop.com" ;

 

  • Make sure a Kerberos TGT (Ticket-Granting Ticket) has been obtained and is in the file cache :
[sasdemo@gatehdp01 ~]$ klist Ticket cache: FILE:/tmp/krb5cc_100003_8ggYYm Default principal: sasdemo@GATEHADOOP.COM Valid starting Expires Service principal 04/11/16 09:20:24 04/12/16 09:20:24 krbtgt/GATEHADOOP.COM@GATEHADOOP.COM renew until 04/18/16 09:20:24

Note: If you don’t see a ticket cache you can use “kinit” command to request a TGT cache.

 

  • Make sure the ticket cache is “forwardable” :
[sasdemo@gatehdp01 ~]$ klist -f /tmp/krb5cc_100003_8ggYYm Ticket cache: FILE:/tmp/krb5cc_100003_8ggYYm Default principal: sasdemo@GATEHADOOP.COM Valid starting Expires Service principal 04/11/16 09:20:24 04/12/16 09:20:24 krbtgt/GATEHADOOP.COM@GATEHADOOP.COM renew until 04/18/16 09:20:24, Flags: FRI

 

Check if you have the “F” in the flags.

 

  • Test your Hadoop libname in SAS batch mode

 

[sasdemo@gatehdp01 ~]$ /opt/sas/SASHome/SASFoundation/9.4/sas libnametest.sas -log libnametest.log Normally you should see something like: 1 %let krb5env=%sysget(KRB5CCNAME); 2 %put &KRB5ENV; FILE:/tmp/krb5cc_100003_8ggYYm 3 LIBNAME HIVELIB HADOOP PORT=10000 SERVER="gatehdp01.gatehadoop.com" ; NOTE: Libref HIVELIB was successfully assigned as follows: Engine: HADOOP Physical Name: jdbc:hive2://gatehdp01.gatehadoop.com:10000/default

 

  • Test with the same code your Hadoop libname in a Workspace Server session.

Conclusion

 

This article is far from being an exclusive list of all potential problems that you might encounter in the field. It does not cover any subtle Kerberos (or Network/System/PAM) configuration that could explain a connectivity failure. However if you are on customer site, it will hopefully help you to check several basic things that, in most of the cases, can explain why the SAS/ACCESS to HADOOP connection is failing. Thanks for reading! 

Comments

Hi Nico,

Sorry I forgot to remove this link to an internal resource from my blog.

I'll check if this article has been published externally and come back to you.

Thanks

Raphael

Hi Nico

Altough this specific blog is not public (I removed the link from my blog), you can find the same information covered in 2 "Hadoop with kerberos considerations" papers here : http://support.sas.com/resources/papers/tnote/hadoop.html

 

Hope that helps.

Thanks

Raphael 

Great, thanks Raphael.

Version history
Last update:
‎08-28-2019 10:40 AM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags