Architecting, installing and maintaining your SAS environment

Slow logins to Hadoop

Accepted Solution Solved
Reply
Super Contributor
Posts: 277
Accepted Solution

Slow logins to Hadoop

Hi team,

 

I'm connecting to Hortonworks from SAS 9.4 M3 using the LIBNAME method.

 

Kerberos is used.

 

The logins are all successful, but it takes 90 - 100 seconds to login i.e. assign the libname in Enterprise Guide.

 

This sounds a bit excessive. Any ideas why ?

 

Thanks,


Accepted Solutions
Solution
‎01-09-2018 11:09 PM
SAS Employee
Posts: 413

Re: Slow logins to Hadoop

@nhvdwalt,

 

Yes, looks like readme file has been removed. Basically, you have to do the following:

 

export SASJAVA=/<REPLACE_WITH_PATH_TO_SAS_HOME>/SASPrivateJavaRuntimeEnvironment/9.4/jre/bin/java
export CLASSPATH="/<REPLACE_WITH_PATH_TO_HADOOP_JARS>/*:/<REPLACE_WITH_PATH_TO_HADOOP_CONFIG>:$CLASSPATH"
$SASJAVA HiveCheck Kerberos <FQDN_OF_HIVE_SERVER> 10000 <REPLACE_WITH_HIVE_PRINCIPAL>


The Hive schema can be changed in file HiveCheck.java on line 196:

 

connection_string = "jdbc:hive2://" + args[1] + ":" + args[2] + "/default";

Do not forget to recompile HiveCheck.java using javac.

View solution in original post


All Replies
SAS Employee
Posts: 413

Re: Slow logins to Hadoop

Super Contributor
Posts: 277

Re: Slow logins to Hadoop

Thanks @alexal

 

I managed to get the script working. I had to specify the below parameters as well since we are using Kerberos:

 

KRB5CCNAME

-Djavax.security.auth.useSubjectCredsOnly=false

 

Does the script have the ability to take a schema name as parameter ? It defaults to the 'default' schema to which I don't have USE access, but I have access to other schemas.

 

The problem doesn't seem to be authentication since it fails almost immediately if the wrong credentials are provided. The lag seems be between successful authentication and the point where it runs the 'show tables' command.

 

Btw...I can't access the README file for this script.

Solution
‎01-09-2018 11:09 PM
SAS Employee
Posts: 413

Re: Slow logins to Hadoop

@nhvdwalt,

 

Yes, looks like readme file has been removed. Basically, you have to do the following:

 

export SASJAVA=/<REPLACE_WITH_PATH_TO_SAS_HOME>/SASPrivateJavaRuntimeEnvironment/9.4/jre/bin/java
export CLASSPATH="/<REPLACE_WITH_PATH_TO_HADOOP_JARS>/*:/<REPLACE_WITH_PATH_TO_HADOOP_CONFIG>:$CLASSPATH"
$SASJAVA HiveCheck Kerberos <FQDN_OF_HIVE_SERVER> 10000 <REPLACE_WITH_HIVE_PRINCIPAL>


The Hive schema can be changed in file HiveCheck.java on line 196:

 

connection_string = "jdbc:hive2://" + args[1] + ":" + args[2] + "/default";

Do not forget to recompile HiveCheck.java using javac.

Super Contributor
Posts: 277

Re: Slow logins to Hadoop

Thanks @alexal, it worked perfectly ! I recompiled the code and can now test the connections successfully.

 

To get back to the original question....

 

I found that the connection to Hadoop takes 94 seconds with Java 7 (SAS supplied) and only 34 seconds with Java 8 (Red Hat supplied). Although we cannot change the SAS version, at least I have a reasonable explanation now. I guess some of the 34 seconds are taken up by Kerberos when the tickets are obtained initially. So overall, it's manageable. The key is to ensure the login timeout is set on the libname statement, since the login time is still outside of the default 30 seconds.

 

Thanks for the help.

 

 

 

SAS Employee
Posts: 413

Re: Slow logins to Hadoop

@nhvdwalt,

 

You are welcome.

I found that the connection to Hadoop takes 94 seconds with Java 7

 

WOW, that's a lot of time. Would you like to debug a problem further?

Super Contributor
Posts: 277

Re: Slow logins to Hadoop

Thanks @alexal

 

Yes, if you have any suggestions about how to troubleshoot, I'll appreciate it.

 

Regards,

SAS Employee
Posts: 413

Re: Slow logins to Hadoop

@nhvdwalt,

 

First off, run JAR with the option -Dsun.security.krb5.debug=true. I would like to see what is going on with the Kerberos authentication.

Super Contributor
Posts: 277

Re: Slow logins to Hadoop

Absolute brilliance @alexal !!

 

This is exactly what I was looking for. Long story short, the trace clearly shows a timeout against the first KDC listed in krb5.conf. I'll chat to the AD team to see why that KDC is offline. This is a production environment, so all services should be available. So 90 of the 94 seconds is spent waiting for an offline KDC. Once the request is sent to the second KDC, the Hadoop connection is instant.

 

Interesting though, although Java 8 also says retries=3, it doesn't actually try 3 times. After the first try fails (30 seconds) it goes to the second KDC. So again, 30 out of the 34 seconds is spent waiting for an offline KDC. But that's just a side note. I'm confident that if the first KDC is online, we'll have almost immediate connection.

 

Thanks a mile for the help !

SAS Employee
Posts: 413

Re: Slow logins to Hadoop

@nhvdwalt,

 

You are welcome! I'm glad that the problem has been identified and it's not SAS :-D

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 256 views
  • 2 likes
  • 2 in conversation