Hi team,
I'm connecting to Hortonworks from SAS 9.4 M3 using the LIBNAME method.
Kerberos is used.
The logins are all successful, but it takes 90 - 100 seconds to login i.e. assign the libname in Enterprise Guide.
This sounds a bit excessive. Any ideas why ?
Thanks,
Yes, looks like readme file has been removed. Basically, you have to do the following:
export SASJAVA=/<REPLACE_WITH_PATH_TO_SAS_HOME>/SASPrivateJavaRuntimeEnvironment/9.4/jre/bin/java
export CLASSPATH="/<REPLACE_WITH_PATH_TO_HADOOP_JARS>/*:/<REPLACE_WITH_PATH_TO_HADOOP_CONFIG>:$CLASSPATH"
$SASJAVA HiveCheck Kerberos <FQDN_OF_HIVE_SERVER> 10000 <REPLACE_WITH_HIVE_PRINCIPAL>
The Hive schema can be changed in file HiveCheck.java on line 196:
connection_string = "jdbc:hive2://" + args[1] + ":" + args[2] + "/default";
Do not forget to recompile HiveCheck.java using javac.
Will you be able to reproduce slowness using a standalone Java program?
ftp://ftp.sas.com/techsup/download/blind/access/Hive.HDFS.newstandalone.zip
ftp://ftp.sas.com/techsup/download/blind/access/Hive.HDFS.newstandalone.readme.txt
Thanks @alexal
I managed to get the script working. I had to specify the below parameters as well since we are using Kerberos:
KRB5CCNAME
-Djavax.security.auth.useSubjectCredsOnly=false
Does the script have the ability to take a schema name as parameter ? It defaults to the 'default' schema to which I don't have USE access, but I have access to other schemas.
The problem doesn't seem to be authentication since it fails almost immediately if the wrong credentials are provided. The lag seems be between successful authentication and the point where it runs the 'show tables' command.
Btw...I can't access the README file for this script.
Yes, looks like readme file has been removed. Basically, you have to do the following:
export SASJAVA=/<REPLACE_WITH_PATH_TO_SAS_HOME>/SASPrivateJavaRuntimeEnvironment/9.4/jre/bin/java
export CLASSPATH="/<REPLACE_WITH_PATH_TO_HADOOP_JARS>/*:/<REPLACE_WITH_PATH_TO_HADOOP_CONFIG>:$CLASSPATH"
$SASJAVA HiveCheck Kerberos <FQDN_OF_HIVE_SERVER> 10000 <REPLACE_WITH_HIVE_PRINCIPAL>
The Hive schema can be changed in file HiveCheck.java on line 196:
connection_string = "jdbc:hive2://" + args[1] + ":" + args[2] + "/default";
Do not forget to recompile HiveCheck.java using javac.
Thanks @alexal, it worked perfectly ! I recompiled the code and can now test the connections successfully.
To get back to the original question....
I found that the connection to Hadoop takes 94 seconds with Java 7 (SAS supplied) and only 34 seconds with Java 8 (Red Hat supplied). Although we cannot change the SAS version, at least I have a reasonable explanation now. I guess some of the 34 seconds are taken up by Kerberos when the tickets are obtained initially. So overall, it's manageable. The key is to ensure the login timeout is set on the libname statement, since the login time is still outside of the default 30 seconds.
Thanks for the help.
You are welcome.
I found that the connection to Hadoop takes 94 seconds with Java 7
WOW, that's a lot of time. Would you like to debug a problem further?
Thanks @alexal
Yes, if you have any suggestions about how to troubleshoot, I'll appreciate it.
Regards,
First off, run JAR with the option -Dsun.security.krb5.debug=true. I would like to see what is going on with the Kerberos authentication.
Absolute brilliance @alexal !!
This is exactly what I was looking for. Long story short, the trace clearly shows a timeout against the first KDC listed in krb5.conf. I'll chat to the AD team to see why that KDC is offline. This is a production environment, so all services should be available. So 90 of the 94 seconds is spent waiting for an offline KDC. Once the request is sent to the second KDC, the Hadoop connection is instant.
Interesting though, although Java 8 also says retries=3, it doesn't actually try 3 times. After the first try fails (30 seconds) it goes to the second KDC. So again, 30 out of the 34 seconds is spent waiting for an offline KDC. But that's just a side note. I'm confident that if the first KDC is online, we'll have almost immediate connection.
Thanks a mile for the help !
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.