Hi
Does anyone know where the best place to get documentation around this is? I'm a bit lost as to where to look!
For explanation, our configuration:
SAS 9.4 ML2 on Windows 2012 Server 64 bit
SAS/ACCESS for Hadoop
Cloudera CDH 5.8 (Hive and HDFS data sources) on Linux (Centos 7)
CDH is (Active Directory) Kerberized (2008R2 AD), and accessed via a kinit-ed user. SAS Server is also part of the same AD domain (and hence the same Kerberos realm). No cross realm configuration is required.
We've successfully managed to get SAS talking to Cloudera Hadoop, via the SAS/ACCESS connector for Hadoop, by logging in to the SAS base server, manually kinit-ing against an AD user with access to Cloudera Hadoop, then starting SAS client, and running the relevant LIBNAME and PROC SQL statements to issue queries against Hive. Results are returned into SAS once the Hive job has completed on the cluster - so operating exactly how we'd expect.
However, we really need to set this up so a client (e.g., SAS Enterprise Guide), also authenticated against the same AD, can use SAS Enterprise Guide to talk to the SAS server, and then onward to Cloudera using the same kerberos credentials (credentials cache / TGT) available on the SAS Enterprise Guide client.
I understand this is done by issuing a kinit on the SAS Server using the SAS ObjectSpawner process, which looks to run a batch script - WorkspaceServer.bat. That makes sense. Although I can't really find any workable examples of this!
So, my question; is it possible (or is there any documentation anyone knows about that explains this!) to issue the kinit via ObjectSpawner, using the user credentials passed from the client running Enterprise Guide?
The article at http://blogs.sas.com/content/sgf/2014/10/15/sas-high-performance-analytics-connecting-to-secure-hado... in the 'Making connections in a standard SAS session' section seems to suggest this is all possible, but it's missing a lot of detail in how we actually do it!
So, the flow through the transaction would typically be
User S logs into Windows, on a client with Enterprise Guide installed.
User S starts Enterprise Guide, and logs into the SAS Server.
The SAS Server (via ObjectSpawner?) runs a kinit for the user 'S' that is running Enterprise Guide.
The the SAS Server can successfully use Hive / HDFS as the kinit'd user 'S'. Data / requests / etc are passed from Enterprise Guide through the SAS server, and onto Hive / HDFS. Results flow back the other way.
Many thanks
Simon
Hi again
So now I'm not trying to type on a phone, I'll expand a bit.
Initially we set-up using the SAS JRE based kinit utilities (the MIT kerberos utilities on windows are included in SAS JRE). Using kinit to directly request a ticket granting ticket from our AD domain. This worked, but didn't feel right - users were having to kinit before they could use SAS. We felt we should be able to use Windows 'domain' kerberos where you get a kinit on login.
To fix this, we ended up removing the c:\windows\krb5.ini file that we'd created. It was unnecessary in the end. Now "domain" kerberos is working for CDH access. We login at the usual windows screen, and get a TGT issued at that point. We can also use Hive / Impala over ODBC using Kerberos as a test of whether connectivity is good.
In terms of process, we need to initially get SAS Server connected, and then connect SAS EG to that. This is to get SAS server connected to CDH. I'll cover the EG -> Server details soon.
The key is getting the right hadoop client libraries on the Windows machine (we had to do this manually as the SAS configuration auto-download wizard wouldn't work over TLS - worked fine on a non TLS cloudera system). SAS support can give you a script that will create an archive of the required library files. You run this - e.g., on a cluster node - and it picks up client config as well. You need the config if you're running HA namenodes / YARN RM as otherwise failovers need a manual update on the client.
As our cluster is configured TLS throughout we also found we needed to copy the cluster truststore.jks into the SAS installation - using the setup utility - as well as the usual root and intermediates. I believe this really just adds the certs to the usual cacerts / jssecacerts files(?) We may be able to limit the cert list to the master nodes only? Not sure - didn't have time to find out!
Once transferred over the the Windows SAS server, assuming SAS/ACCESS for Hadoop is installed, SAS pretty much "just works". Once you've used an appropriate LIBNAME statement (our CDH uses TLS throughout so is a little more complex) you'll get a list of tables in Hive for the user you're logged in as (as long as your CDH side is set up!)
Once you're in and using things, we had a bit of a problem around querying some tables and CTAS operations (hazy details - sorry!)
We're using Sentry so we did have to put a URI grant on the SAS TMPDIR directory to appropriate roles. I found that the SAS error codes are sometimes truncated so they don't give full error message - which actually contains missing privilege information, so generally go to the CDH hive server 2 logs directly on the HS2 server node.
Cheers
Simon
Hi Simon,
here is some good doc about kerberos & kinit implementation / authentication.
SAS and secure Hadoop: 3 deployment requirements:
http://blogs.sas.com/content/sgf/2014/10/09/sas-and-secure-hadoop-3-deployment-requirements/
Hadoop with Kerberos – Deployment Considerations:
http://support.sas.com/resources/papers/Hadoop_Deployment.pdf
Mid-tier Admin Guide:
Hope this helps.
Best,
Anja
Thank-you Anja. Most of the documentation we'd already seen, but the middle-tier document looks really useful.
We've learnt quite a lot from this, so I'll update in due course with our findings, as hopefully hey'll be of some use elsewhere.
Simon
Looking forward to reading your findings! 🙂
Hi Simon,
Hope you found some workaround to this issue.
I am facing exact same problem with SAS Enterprise Guide + Cloudera 5.9 + Kerberos.
Please share your findings if any to resolve this.
Thank you.
Hi again
So now I'm not trying to type on a phone, I'll expand a bit.
Initially we set-up using the SAS JRE based kinit utilities (the MIT kerberos utilities on windows are included in SAS JRE). Using kinit to directly request a ticket granting ticket from our AD domain. This worked, but didn't feel right - users were having to kinit before they could use SAS. We felt we should be able to use Windows 'domain' kerberos where you get a kinit on login.
To fix this, we ended up removing the c:\windows\krb5.ini file that we'd created. It was unnecessary in the end. Now "domain" kerberos is working for CDH access. We login at the usual windows screen, and get a TGT issued at that point. We can also use Hive / Impala over ODBC using Kerberos as a test of whether connectivity is good.
In terms of process, we need to initially get SAS Server connected, and then connect SAS EG to that. This is to get SAS server connected to CDH. I'll cover the EG -> Server details soon.
The key is getting the right hadoop client libraries on the Windows machine (we had to do this manually as the SAS configuration auto-download wizard wouldn't work over TLS - worked fine on a non TLS cloudera system). SAS support can give you a script that will create an archive of the required library files. You run this - e.g., on a cluster node - and it picks up client config as well. You need the config if you're running HA namenodes / YARN RM as otherwise failovers need a manual update on the client.
As our cluster is configured TLS throughout we also found we needed to copy the cluster truststore.jks into the SAS installation - using the setup utility - as well as the usual root and intermediates. I believe this really just adds the certs to the usual cacerts / jssecacerts files(?) We may be able to limit the cert list to the master nodes only? Not sure - didn't have time to find out!
Once transferred over the the Windows SAS server, assuming SAS/ACCESS for Hadoop is installed, SAS pretty much "just works". Once you've used an appropriate LIBNAME statement (our CDH uses TLS throughout so is a little more complex) you'll get a list of tables in Hive for the user you're logged in as (as long as your CDH side is set up!)
Once you're in and using things, we had a bit of a problem around querying some tables and CTAS operations (hazy details - sorry!)
We're using Sentry so we did have to put a URI grant on the SAS TMPDIR directory to appropriate roles. I found that the SAS error codes are sometimes truncated so they don't give full error message - which actually contains missing privilege information, so generally go to the CDH hive server 2 logs directly on the HS2 server node.
Cheers
Simon
Hi Simon,
Thank you very much for the detail information.
This will certainly help to understand SAS connections better.
Once thing I forgot to mention in my post is, our SAS runs on Linux servers.
And I assume the details you provided are for SAS on Windows, correct me if I am wrong.
Not sure if I can configure it with Linux server same way as you explained for Windows.
Thanks,
Vinod
Hello Simon,
I was tracking right along with you, are you able to provide the script/string you used to pass the Kerberos ticket from EG to the server?
That is where I got stuck.
Thanks!
I'm a little late to this discussion, but I'm, also, curious regarding a the process of passing the Kerberos ticket from EG to server.
Thank you.
Hi @advoss,
Since this question has been solved, it's best to start a new discussion to attract more eyes. Feel free to reference this question if it relates to yours. Be sure to give as much detail as possible so the experts can provide assistance.
Thanks,
Shelley
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.