06-18-2014 03:21 AM
I have a issue while executing a libname statement in batch mode on unix machine to connect HDFS through SPDS.
The libaname statement : LIBNAME hdplib SPDE '~/SAS/sangramjit/data' HDFSHOST=DEFAULT;
We are getting some security issue regarding: Can't get Kerberos configuration
The primary path(~/SAS/sangramjit/data) in above libname statement is at linux machine and we are executing the sas script containing libname statement in unix machine.
To be more precise: Hadoop is Configured on Linux Machine and SAS is Configured on Unix Machine and now we want to connect to HDFS on linux machine through SPDS which is on Unix machine.
So my Question is whether the above libname statement is correct or do we need to give some more options as it is throwing error when we do SAS Invocation:
nohup /sast/SAS9.4/Install/SASFoundation/9.4/sas -set SAS_HADOOP_JAR_PATH "/sast/data8/Hadoop/jars" SAS_HADOOP_CONFIG_PATH "/sast/data8/hadoop_config/config" test_hadoop.sas &
We are getting the following issue:
ERROR: Could not connect to HDFS.
ERROR: Libref HDPLIB is not assigned.
ERROR: Error in the LIBNAME statement.
ERROR: Call to method org.apache.hadoop.fs.FileSystem::get(URI, Configuration) failed.
java.lang.IllegalArgumentException: Can't get Kerberos configuration
Steps Done To place jar files and Configuration files at Unix machine and set the Environment Variables:
1.We have listed the Hadoop jar files and Hadoop cluster configuration files and placed them on our unix machine:
Hadoop jar files
Hadoop cluster configuration files
And we have set them at the SAS_HADOOP_JAR_PATH and SAS_HADOOP_CONFIG_PATH at our unix machine.
So please have a look on the above issue and suggest what all we need to perform to solve the issue.Attaching the SAS Script below:
Looking Forward for a Reply.
06-18-2014 05:36 AM
Sounds like an issue best resolved by SAS tech support.
Provide the name of you distribution, SAS version (with maintenance level).
Can you ping Hadoop on the appropriate port?
Then it's a bit confusing that you mentioning SPDS, since to my knowledge, SPDS is not (yet) supporting hdfs.
06-18-2014 06:46 AM
We have configured SAS (r) Proprietary Software 9.4 (TS1M1 MBCS3170) on AIX 7.1 and we have configured Hortonworks 1.2 on our linux machine.
LinusH,I have followed SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System and in that doc it has mentioned that we can
Interface with Haoop,connect to a specific Hadoop cluster and also can store data in HDFS using the SPD Engine.
So,LinusH are you sure that we cannot use SPDS to connect HDFS.Please let me know we can connect or not and also If there is any wrong in the Script as attached in the previous Disscussion.
06-18-2014 07:02 AM
I don't think I can help you out with detailed problem solving of this nature, perhaps SAS tech support can.
And no, I don't think that you can use SPDS, but you should be able to use SPDE. Perhaps you could talk to a SAS representative and ask about the future plans for SPDS and Hadoop.
06-18-2014 07:12 AM
LinusH can we do it through SPDE.Can you Explain a bit more how we can do through SPDE.
Because when I try to run the sas script it shows error.
Can you Find out the cause of Error in the above attached Script.
06-18-2014 07:29 AM
LinusH it would be great if you could tell me how to connect HDFS through SPDE.
AS I followed the SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System and then wrote a sas script: LIBNAME hdplib SPDE '~/SAS/sangramjit/data' HDFSHOST=DEFAULT;
When i executed the Script it throws error regarding Kerberos configuration.So can you please guide me how to make the connection.
06-18-2014 07:59 AM
S'agrandit, you are using hdfs (hadoop) and side.
From hadoop I know the concepts it is a distributed file system being less or more fault tolerant and high performance for retrieving data.
The sas SPD engine/server is also for high performance retrieving data but is adding indexing options.
The security with hadoop on files is done similar like Unix but it is not Unix handling that.
The Kerberos message is indicating you have possible an issue at this point.
I do not trust the ~ usage. Is it the Linux running hadoop sas user or hadoop sas config or the internal hadoop personal home dir. I did not know hadoop is having a personal hone folder location and internal user registration with that.
06-18-2014 08:19 AM
Jaap,Our Hortonworks 1.2(Hadoop) is configured on Linux machine and we have created Directories in HDFS(~/SAS/sangramjit/data).
Inside the data folder we are having some files.
We have Configured SAS 9.4 on our Unix Machine.We now wrote a SAS script(a simple libname statement locating that directory) after following the below doc
When we execute the script it throws error saying Kerberos configuration issue.
We can use this type of libname statement as per the doc,but we are not able to identify the issue.
06-19-2014 01:59 AM
Seen this? http://wiki.apache.org/hadoop/HadoopIsNot
Did you test your hadoop installation without SAS?
The start is found here http://wiki.apache.org/hadoop/QuickStart
The use of a HDFS is similar to Unix files but is a different one.
The requirement is using fully qualified names.
I am expecting the config files of hadoop are defining the hadoop environment the key used for installing hadoop your personal key - will get the root key role within hadoop.
You will have to format the hdfs before being used with hadoop commands not OS commands.
I am not easily finding the hdfs set up like posit using child like approach.
At the end you should have hdfs files like Unix files but maintained by the HDFS service.
This is why I am seeing the~ as weird.
Look like a hdfs data cluster is adresses immediate instead of by the hadoop hdfs service.
06-19-2014 09:56 AM
You probably won't like this answer, but...
The current version of SPDE only supports CDH 4.3.1 (and later, I think). HDP is not a supported platform at this. The upcoming SAS 9.4M2 release will add support Hortonworks Data Platform 2.0, and later (later means 2.x). The supported platforms are detailed here (Hadoop Version section):
LIBNAME hdplib SPDE '~/SAS/sangramjit/data' HDFSHOST=DEFAULT;
Your LIBNAME statement has at least one problem. Jaap was suspicious of '~' and he was right. This is an directory in HDFS. HDFS has no concept of a present working directory so things like '.', '..', and '~'. You will need to fully expand it. There is no way for me to know what this directory should be. Typical values look similar to this: /user/someusername.
06-19-2014 11:18 AM
On the other hand, what are you planning to do with this? What kind of application are you planning?
As I may have mentioned in some other thread, I can't really see the benefit of the SPDE Hadoop libname, until it fully supports the existing set of functionality (for standard "local" file systems), meaning that WHERE-clause evaluation, implicit sort and parallel index creation is executed in hdfs.
06-20-2014 02:08 AM
Please be confirm that whether current version of SPDE can connect to Hortonworks 1.2 as we have Hortonworks 1.2 configured on linux machine.
and we also configured SAS9.4 on our AIX 7.1 machine.
In this libname statement : LIBNAME hdplib SPDE '~/SAS/sangramjit/data' HDFSHOST=DEFAULT; (~/SAS/sangramjit/data) represents the relative path not the absolute path before SAS folder a lot of folders were there
so in order to not mention the full path i have denoted like that.
But the real statement looks like: LIBNAME hdplib SPDE '/data8/TEST/SAS/sangramjit/data' HDFSHOST=DEFAULT;
Ok Please say now whether the libname statement above is correct or not.
It will be grateful to us if you confirm us whether SPDE can connect to Hortonworks 1.2 or not.
Thanking You Looking for a Reply
06-23-2014 03:34 PM
Unfortunately, I don't know whether SPDE can connect to Hortonworks 1.2. It is an unsupported configuration; it has not been tested. Unsupported means if you call SAS tech support they can't assist you. I do know that support for Hortonworks is coming with the next release. I think it is for HDF 2.x, but I am not entirely sure.
I do think your LIBNAME statement has a problem. HDFS does not support relative paths. You must specify the full HDFS directory name. If you look at the available HDFS commands it will make sense. For example, there is no 'cd' (change directory) or 'pwd' (present working directory) command in HDFS. The directory you have specified does not appear to be an HDFS directory. They usually will start with /user. I think your LIBNAME statement should look something like this:
LIBNAME hdplib SPDE '/user/sangramjit/data' HDFSHOST=DEFAULT;
Hope this helps,
06-25-2014 12:54 AM
Thanks for your positive post.I followed SAS Technical Support and they said that SPDE does not support HortonWorks 1.2.