SPDEngine:Storing Data in the Hadoop Distributed File System

Reply
Occasional Contributor
Posts: 11

SPDEngine:Storing Data in the Hadoop Distributed File System

Hi,

I have a issue while executing a libname statement in batch mode on unix machine  to connect HDFS through SPDS.

The libaname statement : LIBNAME hdplib SPDE '~/SAS/sangramjit/data' HDFSHOST=DEFAULT;

We are getting some security issue regarding: Can't get Kerberos configuration


The primary path(~/SAS/sangramjit/data) in above libname statement is at linux machine and we are executing the sas script containing libname statement in unix machine.

To be more precise: Hadoop is Configured on Linux Machine and SAS is Configured on Unix Machine and now we want to connect to HDFS on linux machine through SPDS which is on Unix machine.

So my Question is whether the above libname statement is correct or do we need to give some more options as it is throwing error when we do SAS Invocation:

nohup /sast/SAS9.4/Install/SASFoundation/9.4/sas -set SAS_HADOOP_JAR_PATH "/sast/data8/Hadoop/jars" SAS_HADOOP_CONFIG_PATH "/sast/data8/hadoop_config/config" test_hadoop.sas &


We are getting the following issue:

ERROR: Could not connect to HDFS.

ERROR: Libref HDPLIB is not assigned.

ERROR: Error in the LIBNAME statement.

.

.

ERROR: Call to method org.apache.hadoop.fs.FileSystem::get(URI, Configuration) failed.

ERROR: java.lang.ExceptionInInitializerError

.

.

java.lang.IllegalArgumentException: Can't get Kerberos configuration


Steps Done To place jar files and Configuration files at Unix machine and set the Environment Variables:

1.We have listed the Hadoop jar files and Hadoop cluster configuration files and placed them on our unix machine:

Hadoop jar files

guava-11.0.2.jar                   

hadoop-core-2.0.0-mr1-cdh4.3.1.jar

hive-exec-0.10.0-cdh4.3.1.jar

hive-jdbc-0.10.0-cdh4.3.1.jar

hive-metastore-0.10.0-cdh4.3.1.jar

hive-service-0.10.0-cdh4.3.1.jar

libfb303-0.9.0.jar

pig-0.11.0-cdh4.3.1.jar

protobuf-java-2.4.0a.jar

Hadoop cluster configuration files

core-site.xml

hdfs-site.xml

mapred-site.xml


And we have set them at the SAS_HADOOP_JAR_PATH and SAS_HADOOP_CONFIG_PATH at our unix machine.


So please have a look on the above issue and suggest what all we need to perform to solve the issue.Attaching the SAS Script below:

Looking Forward for a Reply.


Thanks,

Sangramjit



Attachment
Super User
Posts: 5,260

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Sounds like an issue best resolved by SAS tech support.

Provide the name of you distribution, SAS version (with maintenance level).

Can you ping Hadoop on the appropriate port?

Then it's a bit confusing that you mentioning SPDS, since to my knowledge, SPDS is not (yet) supporting hdfs.

Data never sleeps
Occasional Contributor
Posts: 11

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Hi LinusH,

We have configured SAS (r) Proprietary Software 9.4 (TS1M1 MBCS3170)  on AIX 7.1 and we have configured Hortonworks 1.2 on our linux machine.

LinusH,I have followed SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System and in that doc it has mentioned that we can

Interface with Haoop,connect to a specific Hadoop cluster and also can store data in HDFS using the SPD Engine.

So,LinusH are you sure that we cannot use SPDS to connect HDFS.Please let me know we can connect or not and also If there is any wrong in the Script as attached in the previous Disscussion.

Thanks,

Sangramjit

Super User
Posts: 5,260

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

I don't think I can help you out with detailed problem solving of this nature, perhaps SAS tech support can.

And no, I don't think that you can use SPDS, but you should be able to use SPDE. Perhaps you could talk to a SAS representative and ask about the future plans for SPDS and Hadoop.

Data never sleeps
Occasional Contributor
Posts: 11

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

LinusH can we do it through SPDE.Can you Explain a bit more how we can do through SPDE.

Because when I try to run the sas script it shows error.

Can you Find out the cause of Error in the above attached Script.

Occasional Contributor
Posts: 11

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

LinusH it would be great if you could tell me how to connect HDFS through SPDE.

AS I followed the SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System and then wrote a sas script: LIBNAME hdplib SPDE '~/SAS/sangramjit/data' HDFSHOST=DEFAULT;

When i executed the Script it throws error regarding Kerberos configuration.So can you please guide me how to make the connection.

Valued Guide
Posts: 3,208

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

S'agrandit,  you are using hdfs (hadoop) and side.

From hadoop I know the concepts it is a distributed file system being less or more fault tolerant and high performance for retrieving data.

The sas SPD engine/server is also for high performance retrieving data but is adding indexing options.

The security with hadoop on files is done similar like Unix but it is not Unix handling that.

The Kerberos message is indicating you have possible an issue at this point.

I do not trust the ~ usage. Is it the Linux running hadoop sas user or hadoop sas config or the internal hadoop personal home dir. I did not know hadoop is having a personal hone folder location and internal user registration with that.

---->-- ja karman --<-----
Occasional Contributor
Posts: 11

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Jaap,Our Hortonworks 1.2(Hadoop) is configured on Linux machine and we have created Directories in HDFS(~/SAS/sangramjit/data).

Inside the data folder we are having some files.

We have Configured SAS 9.4 on our Unix Machine.We now wrote a SAS script(a simple libname statement locating that directory) after following the below doc

SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System Guide.

When we execute the script it throws error saying Kerberos configuration issue.

We can use this type of libname statement as per the doc,but we are not able to identify the issue.

Valued Guide
Posts: 3,208

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Seen this?  http://wiki.apache.org/hadoop/HadoopIsNot

Did you test your hadoop installation without SAS?

The start is found here  http://wiki.apache.org/hadoop/QuickStart


The use of a HDFS is similar to Unix files but is a different one.

The requirement is using fully qualified names.

http://itm-vm.shidler.hawaii.edu/HDFS/ArchDocOverview.html#_Toc291720190


I am expecting the config files of hadoop are defining the hadoop environment the key used for installing hadoop your personal key - will get the root key role within hadoop.

You will have to format the hdfs before being used with hadoop commands not OS commands.

I am  not easily finding the hdfs set up like posit using child like approach.

At the end you should have hdfs files like Unix files but maintained by the HDFS service.

This is why I am seeing the~ as weird.

Look like a hdfs data cluster is adresses immediate instead of by the hadoop  hdfs service.

---->-- ja karman --<-----
SAS Employee
Posts: 203

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Hi Sangramjit,


You probably won't like this answer, but...

The current version of SPDE only supports CDH 4.3.1 (and later, I think). HDP is not a supported platform at this. The upcoming SAS 9.4M2 release will add support Hortonworks Data Platform 2.0, and later (later means 2.x). The supported platforms are detailed here (Hadoop Version section):

SAS(R) 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System

LIBNAME hdplib SPDE '~/SAS/sangramjit/data' HDFSHOST=DEFAULT;

Your LIBNAME statement has at least one problem. Jaap was suspicious of '~' and he was right. This is an directory in HDFS. HDFS has no concept of a present working directory so things like '.', '..', and '~'. You will need to fully expand it. There is no way for me to know what this directory should be. Typical values look similar to this: /user/someusername.

Super User
Posts: 5,260

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Just to raise some hop for , I got this to work with IBM BigInsights, which is not a supportedplatform.

On the other hand, what are you planning to do with this? What kind of application are you planning?

As I may have mentioned in some other thread, I can't really see the benefit of the SPDE Hadoop libname, until it fully supports the existing set of functionality (for standard "local" file systems), meaning that WHERE-clause evaluation, implicit sort and parallel index creation is executed in hdfs.

Data never sleeps
SAS Employee
Posts: 203

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Hi LinusH,

I have to toe the company line with regards to supported distributions Smiley Wink

Occasional Contributor
Posts: 11

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Hi JBailey,

Please be confirm that whether current version of SPDE can connect to Hortonworks 1.2 as we have Hortonworks 1.2 configured on linux machine.

and we also configured SAS9.4 on our AIX 7.1 machine.

In this libname statement : LIBNAME hdplib SPDE '~/SAS/sangramjit/data' HDFSHOST=DEFAULT; (~/SAS/sangramjit/data) represents the relative path not the absolute path before SAS folder a lot of folders were there

so in order to not mention the full path i have denoted like that.

But the real statement looks like: LIBNAME hdplib SPDE '/data8/TEST/SAS/sangramjit/data' HDFSHOST=DEFAULT;

Ok Please say now whether the libname statement above is correct or not.

It will be grateful to us if you confirm us whether SPDE can connect to  Hortonworks 1.2 or not.

Thanking You Looking for a Reply

Sangramjit

SAS Employee
Posts: 203

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Hi Sangramjit,

Unfortunately, I don't know whether SPDE can connect to Hortonworks 1.2. It is an unsupported configuration; it has not been tested.  Unsupported means if you call SAS tech support they can't assist you. I do know that support for Hortonworks is coming with the next release. I think it is for HDF 2.x, but I am not entirely sure.

I do think your LIBNAME statement has a problem. HDFS does not support relative paths. You must specify the full HDFS directory name. If you look at the available HDFS commands it will make sense. For example, there is no 'cd' (change directory) or 'pwd' (present working directory) command in HDFS. The directory you have specified does not appear to be an HDFS directory. They usually will start with /user. I think your LIBNAME statement should look something like this:

LIBNAME hdplib SPDE '/user/sangramjit/data' HDFSHOST=DEFAULT;

Hope this helps,

Jeff

Occasional Contributor
Posts: 11

Re: SPDEngine:Storing Data in the Hadoop Distributed File System

Hi JBailey,

Thanks for your positive post.I followed SAS Technical Support and they said that SPDE does not support HortonWorks 1.2.

Thanks,

Sangramjit

Ask a Question
Discussion stats
  • 14 replies
  • 1160 views
  • 7 likes
  • 4 in conversation