BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nhvdwalt
Barite | Level 11

Hi team,

 

I'm connecting to Hortonworks from SAS 9.4 M3 using the LIBNAME method.

 

Kerberos is used.

 

The logins are all successful, but it takes 90 - 100 seconds to login i.e. assign the libname in Enterprise Guide.

 

This sounds a bit excessive. Any ideas why ?

 

Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
alexal
SAS Employee

@nhvdwalt,

 

Yes, looks like readme file has been removed. Basically, you have to do the following:

 

export SASJAVA=/<REPLACE_WITH_PATH_TO_SAS_HOME>/SASPrivateJavaRuntimeEnvironment/9.4/jre/bin/java
export CLASSPATH="/<REPLACE_WITH_PATH_TO_HADOOP_JARS>/*:/<REPLACE_WITH_PATH_TO_HADOOP_CONFIG>:$CLASSPATH"
$SASJAVA HiveCheck Kerberos <FQDN_OF_HIVE_SERVER> 10000 <REPLACE_WITH_HIVE_PRINCIPAL>


The Hive schema can be changed in file HiveCheck.java on line 196:

 

connection_string = "jdbc:hive2://" + args[1] + ":" + args[2] + "/default";

Do not forget to recompile HiveCheck.java using javac.

View solution in original post

9 REPLIES 9
nhvdwalt
Barite | Level 11

Thanks @alexal

 

I managed to get the script working. I had to specify the below parameters as well since we are using Kerberos:

 

KRB5CCNAME

-Djavax.security.auth.useSubjectCredsOnly=false

 

Does the script have the ability to take a schema name as parameter ? It defaults to the 'default' schema to which I don't have USE access, but I have access to other schemas.

 

The problem doesn't seem to be authentication since it fails almost immediately if the wrong credentials are provided. The lag seems be between successful authentication and the point where it runs the 'show tables' command.

 

Btw...I can't access the README file for this script.

alexal
SAS Employee

@nhvdwalt,

 

Yes, looks like readme file has been removed. Basically, you have to do the following:

 

export SASJAVA=/<REPLACE_WITH_PATH_TO_SAS_HOME>/SASPrivateJavaRuntimeEnvironment/9.4/jre/bin/java
export CLASSPATH="/<REPLACE_WITH_PATH_TO_HADOOP_JARS>/*:/<REPLACE_WITH_PATH_TO_HADOOP_CONFIG>:$CLASSPATH"
$SASJAVA HiveCheck Kerberos <FQDN_OF_HIVE_SERVER> 10000 <REPLACE_WITH_HIVE_PRINCIPAL>


The Hive schema can be changed in file HiveCheck.java on line 196:

 

connection_string = "jdbc:hive2://" + args[1] + ":" + args[2] + "/default";

Do not forget to recompile HiveCheck.java using javac.

nhvdwalt
Barite | Level 11

Thanks @alexal, it worked perfectly ! I recompiled the code and can now test the connections successfully.

 

To get back to the original question....

 

I found that the connection to Hadoop takes 94 seconds with Java 7 (SAS supplied) and only 34 seconds with Java 8 (Red Hat supplied). Although we cannot change the SAS version, at least I have a reasonable explanation now. I guess some of the 34 seconds are taken up by Kerberos when the tickets are obtained initially. So overall, it's manageable. The key is to ensure the login timeout is set on the libname statement, since the login time is still outside of the default 30 seconds.

 

Thanks for the help.

 

 

 

alexal
SAS Employee

@nhvdwalt,

 

You are welcome.

I found that the connection to Hadoop takes 94 seconds with Java 7

 

WOW, that's a lot of time. Would you like to debug a problem further?

nhvdwalt
Barite | Level 11

Thanks @alexal

 

Yes, if you have any suggestions about how to troubleshoot, I'll appreciate it.

 

Regards,

alexal
SAS Employee

@nhvdwalt,

 

First off, run JAR with the option -Dsun.security.krb5.debug=true. I would like to see what is going on with the Kerberos authentication.

nhvdwalt
Barite | Level 11

Absolute brilliance @alexal !!

 

This is exactly what I was looking for. Long story short, the trace clearly shows a timeout against the first KDC listed in krb5.conf. I'll chat to the AD team to see why that KDC is offline. This is a production environment, so all services should be available. So 90 of the 94 seconds is spent waiting for an offline KDC. Once the request is sent to the second KDC, the Hadoop connection is instant.

 

Interesting though, although Java 8 also says retries=3, it doesn't actually try 3 times. After the first try fails (30 seconds) it goes to the second KDC. So again, 30 out of the 34 seconds is spent waiting for an offline KDC. But that's just a side note. I'm confident that if the first KDC is online, we'll have almost immediate connection.

 

Thanks a mile for the help !

alexal
SAS Employee

@nhvdwalt,

 

You are welcome! I'm glad that the problem has been identified and it's not SAS 😄

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3449 views
  • 2 likes
  • 2 in conversation