BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
simoncole
Fluorite | Level 6

Hi 

 

Does anyone know where the best place to get documentation around this is? I'm a bit lost as to where to look!

 

For explanation, our configuration:

 

SAS 9.4 ML2 on Windows 2012 Server 64 bit

SAS/ACCESS for Hadoop

Cloudera CDH 5.8 (Hive and HDFS data sources) on Linux (Centos 7)

CDH is (Active Directory) Kerberized (2008R2 AD), and accessed via a kinit-ed user. SAS Server is also part of the same AD domain (and hence the same Kerberos realm). No cross realm configuration is required.

 

We've successfully managed to get SAS talking to Cloudera Hadoop, via the SAS/ACCESS connector for Hadoop, by logging in to the SAS base server, manually kinit-ing against an AD user with access to Cloudera Hadoop, then starting SAS client, and running the relevant LIBNAME and PROC SQL statements to issue queries against Hive. Results are returned into SAS once the Hive job has completed on the cluster - so operating exactly how we'd expect.

 

However, we really need to set this up so a client (e.g., SAS Enterprise Guide), also authenticated against the same AD, can use SAS Enterprise Guide to talk to the SAS server, and then onward to Cloudera using the same kerberos credentials (credentials cache / TGT) available on the SAS Enterprise Guide client.

 

I understand this is done by issuing a kinit on the SAS Server using the SAS ObjectSpawner process, which looks to run a batch script - WorkspaceServer.bat. That makes sense. Although I can't really find any workable examples of this!

 

So, my question; is it possible (or is there any documentation anyone knows about that explains this!) to issue the kinit via ObjectSpawner, using the user credentials passed from the client running Enterprise Guide?

 

The article at http://blogs.sas.com/content/sgf/2014/10/15/sas-high-performance-analytics-connecting-to-secure-hado... in the 'Making connections in a standard SAS session' section seems to suggest this is all possible, but it's missing a lot of detail in how we actually do it!

 

So, the flow through the transaction would typically be

 

User S logs into Windows, on a client with Enterprise Guide installed.

User S starts Enterprise Guide, and logs into the SAS Server.

The SAS Server (via ObjectSpawner?) runs a kinit for the user 'S' that is running Enterprise Guide.

The the SAS Server can successfully use Hive / HDFS as the kinit'd user 'S'. Data / requests / etc are passed from Enterprise Guide through the SAS server, and onto Hive / HDFS. Results flow back the other way. 

 

Many thanks

Simon

1 ACCEPTED SOLUTION

Accepted Solutions
simoncole
Fluorite | Level 6

Hi again

 

So now I'm not trying to type on a phone, I'll expand a bit.

 

Initially we set-up using the SAS JRE based kinit utilities (the MIT kerberos utilities on windows are included in SAS JRE). Using kinit to directly request a ticket granting ticket from our AD domain. This worked, but didn't feel right - users were having to kinit before they could use SAS. We felt we should be able to use Windows 'domain' kerberos where you get a kinit on login.

 

 

To fix this, we ended up removing the c:\windows\krb5.ini file that we'd created. It was unnecessary in the end. Now "domain" kerberos is working for CDH access. We login at the usual windows screen, and get a TGT issued at that point. We can also use Hive / Impala over ODBC using Kerberos as a test of whether connectivity is good.

 

In terms of process, we need to initially get SAS Server connected, and then connect SAS EG to that. This is to get SAS server connected to CDH. I'll cover the EG -> Server details soon.

 

The key is getting the right hadoop client libraries on the Windows machine (we had to do this manually as the SAS configuration auto-download wizard wouldn't work over TLS - worked fine on a non TLS cloudera system). SAS support can give you a script that will create an archive of the required library files. You run this - e.g., on a cluster node - and it picks up client config as well. You need the config if you're running HA namenodes / YARN RM as otherwise failovers need a manual update on the client.

 

As our cluster is configured TLS throughout we also found we needed to copy the cluster truststore.jks into the SAS installation - using the setup utility - as well as the usual root and intermediates. I believe this really just adds the certs to the usual cacerts / jssecacerts files(?) We may be able to limit the cert list to the master nodes only? Not sure - didn't have time to find out!

 

Once transferred over the the Windows SAS server, assuming SAS/ACCESS for Hadoop is installed, SAS pretty much "just works". Once you've used an appropriate LIBNAME statement (our CDH uses TLS throughout so is a little more complex) you'll get a list of tables in Hive for the user you're logged in as (as long as your CDH side is set up!)

 

 

Once you're in and using things, we had a bit of a problem around querying some tables and CTAS operations (hazy details - sorry!)

 

We're using Sentry so we did have to put a URI grant on the SAS TMPDIR directory to appropriate roles. I found that the SAS error codes are sometimes truncated so they don't give full error message - which actually contains missing privilege information, so generally go to the CDH hive server 2 logs directly on the HS2 server node.

 

Cheers

Simon

View solution in original post

10 REPLIES 10
anja
SAS Employee

Hi Simon,

 

here is some good doc about kerberos & kinit implementation / authentication.

 

SAS and secure Hadoop: 3 deployment requirements:

http://blogs.sas.com/content/sgf/2014/10/09/sas-and-secure-hadoop-3-deployment-requirements/

 

Hadoop with Kerberos – Deployment Considerations:

http://support.sas.com/resources/papers/Hadoop_Deployment.pdf

 

Mid-tier Admin Guide:

http://support.sas.com/documentation/cdl/en/bimtag/69826/HTML/default/viewer.htm#p1871e69gmwdr0n1o18...

 

Hope this helps.

 

Best,

Anja

 

simoncole
Fluorite | Level 6

Thank-you Anja. Most of the documentation we'd already seen, but the middle-tier document looks really useful.

 

 

We've learnt quite a lot from this, so I'll update in due course with our findings, as hopefully hey'll be of some use elsewhere.

 

Simon

JuanS_OCS
Amethyst | Level 16

Looking forward to reading your findings! 🙂

vinod369
Calcite | Level 5

Hi Simon,

Hope you found some workaround to this issue.

I am facing exact same problem with SAS Enterprise Guide + Cloudera 5.9 + Kerberos.

Please share your findings if any to resolve this.

 

Thank you.

simoncole
Fluorite | Level 6
Hi

So in the end this was pretty straightforward. I'm assuming you've got SAS/ACCESS for Hadoop installed. SAS should be 9.4 ML2 at least for Kerberos compatibility. We are using SAS for windows, so firstly make sure the correct registry changes are made as per the SAS installation guide. This enables SAS to use the protected memory based credentials cache that windows uses. Once that is set, we found it necessary to remove all Kerberos modifications we'd made - e.g. No need for c:\windows\krb5.ini, no Jaas modifications.

To test, login to the SAS server and try a Hadoop based LIBNAME statement - e.g pointing at Hive. You should get a list of available tables / schemes in the library browser.

For EG it was necessary to create an objectspawner script so when the incoming connection is started to the SAS server is found this script gets run. I'll get you the details around the script's contents once I'm back in the office later this week.

Cheers
Simon
simoncole
Fluorite | Level 6

Hi again

 

So now I'm not trying to type on a phone, I'll expand a bit.

 

Initially we set-up using the SAS JRE based kinit utilities (the MIT kerberos utilities on windows are included in SAS JRE). Using kinit to directly request a ticket granting ticket from our AD domain. This worked, but didn't feel right - users were having to kinit before they could use SAS. We felt we should be able to use Windows 'domain' kerberos where you get a kinit on login.

 

 

To fix this, we ended up removing the c:\windows\krb5.ini file that we'd created. It was unnecessary in the end. Now "domain" kerberos is working for CDH access. We login at the usual windows screen, and get a TGT issued at that point. We can also use Hive / Impala over ODBC using Kerberos as a test of whether connectivity is good.

 

In terms of process, we need to initially get SAS Server connected, and then connect SAS EG to that. This is to get SAS server connected to CDH. I'll cover the EG -> Server details soon.

 

The key is getting the right hadoop client libraries on the Windows machine (we had to do this manually as the SAS configuration auto-download wizard wouldn't work over TLS - worked fine on a non TLS cloudera system). SAS support can give you a script that will create an archive of the required library files. You run this - e.g., on a cluster node - and it picks up client config as well. You need the config if you're running HA namenodes / YARN RM as otherwise failovers need a manual update on the client.

 

As our cluster is configured TLS throughout we also found we needed to copy the cluster truststore.jks into the SAS installation - using the setup utility - as well as the usual root and intermediates. I believe this really just adds the certs to the usual cacerts / jssecacerts files(?) We may be able to limit the cert list to the master nodes only? Not sure - didn't have time to find out!

 

Once transferred over the the Windows SAS server, assuming SAS/ACCESS for Hadoop is installed, SAS pretty much "just works". Once you've used an appropriate LIBNAME statement (our CDH uses TLS throughout so is a little more complex) you'll get a list of tables in Hive for the user you're logged in as (as long as your CDH side is set up!)

 

 

Once you're in and using things, we had a bit of a problem around querying some tables and CTAS operations (hazy details - sorry!)

 

We're using Sentry so we did have to put a URI grant on the SAS TMPDIR directory to appropriate roles. I found that the SAS error codes are sometimes truncated so they don't give full error message - which actually contains missing privilege information, so generally go to the CDH hive server 2 logs directly on the HS2 server node.

 

Cheers

Simon

vinod369
Calcite | Level 5

Hi Simon,

Thank you very much for the detail information. 

This will certainly help to understand SAS connections better.

 

Once thing I forgot to mention in my post is, our SAS runs on Linux servers.

And I assume the details you provided are for SAS on Windows, correct me if I am wrong.

 

Not sure if I can configure it with Linux server same way as you explained for Windows.

 

Thanks,

Vinod

dave_foster
Fluorite | Level 6

Hello Simon,

 

I was tracking right along with you, are you able to provide the script/string you used to pass the Kerberos ticket from EG to the server?

That is where I got stuck.

 

Thanks!

advoss
Quartz | Level 8

I'm a little late to this discussion, but I'm, also, curious regarding a the process of passing the Kerberos ticket from EG to server.

 

Thank you.

ShelleySessoms
Community Manager

Hi @advoss,

 

Since this question has been solved, it's best to start a new discussion to attract more eyes. Feel free to reference this question if it relates to yours. Be sure to give as much detail as possible so the experts can provide assistance.

 

Thanks,

Shelley

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 5206 views
  • 5 likes
  • 7 in conversation