SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Issue in SAS integration with Hadoop

Accepted Solution Solved
Reply
Contributor
Posts: 30
Accepted Solution

Issue in SAS integration with Hadoop

I am trying to integrate SAS with Hadoop. The details are below. SAS interface to Hive 9.4

Linux 2.6.32-358.2.1.el6.x86_64 (LIN X64) platform. When I am trying to create a hive table using SAS I am getting the error below

libname hdplib hadoop server="XXXXXX" user="XXX" password="$XXXX" port=10001;

data hdplib.class;

set sashelp.class(obs=10);

run;

HADOOP_10: Prepared: on connection 2 37 1387861396 no_name 0 DATASTEP

SHOW TABLE EXTENDED LIKE `CLASS` 38 1387861396 no_name 0 DATASTEP

39 1387861396 no_name 0 DATASTEP

40 1387861397 no_name 0 DATASTEP

HADOOP_11: Executed: on connection 2 41 1387861397 no_name 0 DATASTEP

CREATE TABLE `CLASS` (`Name` STRING,`Sex` STRING,`Age` DOUBLE,`Height` DOUBLE,`Weight` DOUBLE) ROW FORMAT DELIMITED FIELDS

TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE TBLPROPERTIES ('SAS OS Name'='Linux','SAS

Version'='9.04.01M0P06192013','SASFMT:Name'='CHAR(8)','SASFMTSmiley Frustratedex'='CHAR(1)') 42 1387861397 no_name 0 DATASTEP

43 1387861397 no_name 0 DATASTEP

ERROR: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.DistributedFileSystem

NOTE: Validate the contents of the Hadoop configuration file and ensure user permissions are correct.

ERROR: Unable to create stream service from /tmp/sasdata-2013-12-24-00-03-17-513-e-00001.dlv. Use the debug option for more

information.

ERROR: Unable to create stream service from /tmp/sasdata-2013-12-24-00-03-17-513-e-00001.dlv. Use the debug option for more

information.


Can someone assist in resolving this issue?


Accepted Solutions
Solution
‎01-02-2014 11:36 AM
SAS Employee
Posts: 203

Re: Issue in SAS integration with Hadoop

Hi Hashim,

The LIBNAME statement doesn't require that you provide a configuration file. You will need to know if the cluster is running Hive or HiveServer2. It it is running HiveServer2 you will need to include SUBPROTOCOL=hive2 on your LIBNAME statement. Getting this wrong results in the LIBNAME statement hanging. You aren't getting that far, yet.

Your JAR files do not match the ones that I use when access CDH 4.3.1. The error message you get from the LIBNAME statement leads me to believe this is the issue. Here is my list of JAR files (SAS_HADOOP_JAR_PATH points to a directory containing these JARs).

guava-11.0.2.jar

hadoop-auth-2.0.0-cdh4.3.1.jar

hadoop-common-2.0.0-cdh4.3.1.jar

hadoop-core-2.0.0-mr1-cdh4.3.1.jar

hadoop-hdfs-2.0.0-cdh4.3.1.jar

hive-exec-0.10.0-cdh4.3.1.jar

hive-jdbc-0.10.0-cdh4.3.1.jar

hive-metastore-0.10.0-cdh4.3.1.jar

hive-service-0.10.0-cdh4.3.1.jar

libfb303-0.9.0.jar

pig-0.11.0-cdh4.3.1.jar

protobuf-java-2.4.0a.jar

Most of the connection issues we see involve incorrect JAR files, Kerberos security (supported by SAS 9.4 only), and MapReduce2 running on the cluster. We periodically see folks have configuration issues when running HDFS on a separate machine.

View solution in original post


All Replies
Occasional Contributor
Posts: 18

Re: Issue in SAS integration with Hadoop

Hi,

It looks like installation and configuration issue. I would check with the local Administrator first. I guess you might get more and better responses if you could run:

proc javainfo picklist 'hadoop/hdoopsasjars.txt';

run;

and post the results to identify if there are any missing JAR files.

Thanks,

Naveen

Contributor
Posts: 30

Re: Issue in SAS integration with Hadoop

Naveen,

Here is the result.

file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/sas.hadoop.hivehelper_904000.0.0.20130522190000_v940/sas.hadoop.hivehel

per.jar

file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/Log4J_1.2.15.0_SAS_20121211183158/log4j.jar

file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/commons_beanutils_1.8.2.0_SAS_20121211183319/commons-beanutils.jar

file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/commons_collections_3.2.1.0_SAS_20121211183225/commons-collections.jar

file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/commons_logging_1.1.1.0_SAS_20121211183202/commons-logging.jar

file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/jackson_1.9.7.0_SAS_20121211183158/jackson.jar

file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/slf4j_1.5.10.0_SAS_20121211183229/slf4j-api.jar

file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/slf4j_1.5.10.0_SAS_20121211183229/slf4j-log4j12.jar

Total URLs: 8

Is there anything missing?

Contributor
Posts: 30

Re: Issue in SAS integration with Hadoop

Also the Jar files being used are

The JAR files are

guava.jar

hadoop-auth-0.23.1.jar

hadoop-common.jar

hadoop-core.jar

hadoop-hdfs-0.23.5.jar

hadoop-streaming-2.0.0-mr1-cdh4.3.1.jar

hive-exec-0.10.0.jar

hive-jdbc-0.10.0.jar

hive-metastore-0.10.0.jar

hive-service-0.8.1.jar

libfb303-0.7.0.jar

pig.jar

protobuf-java-2.4.1.jar

Regular Contributor
Posts: 213

Re: Issue in SAS integration with Hadoop

Hashim,

While the SAS Tech Support are away on Holiday, I would suggest you review the following Webcasts, they may shed some light on how to get around your issue.

- Getting Started with SAS® and Hadoop
   In this live webinar, SAS technical expert Jeff Bailey covers the basics of SAS and Hadoop. This has a section talking about configuring SAS Access to Hadoop.

- SAS® Integration with Hadoop: Part II

  Tune in for part two of our series on Hadoop – learning more about SAS integration with Hadoop

Hope this helps,

Ahmed

Contributor
Posts: 30

Re: Issue in SAS integration with Hadoop

Thanks Ahmed.

We have the connection established between SAS and Hadoop.

libname hdplib hadoop server="CSDFFGF" user="basheerh" password=XXXXXXXXXXXXX port=10001;

NOTE: Libref HDPLIB was successfully assigned as follows:


It seems its an issue with the configuration

SAS Employee RMP
SAS Employee
Posts: 52

Re: Issue in SAS integration with Hadoop

Well actually what you have done is made a connection to the Hive server and not to HDFS. I think the problem lies in the fact that SAS cannot connect to HDFS and hence the the original error. When you run the original code

libname hdplib hadoop server="XXXXXX" user="XXX" password="$XXXX" port=10001;

data hdplib.class;

set sashelp.class(obs=10);

run;


I think you will see that the temp file is being written to the local file system and not to HDFS. When HIVE attempts to move the temp file from HDFS to Hive it fails since the file is not available on the HDFS file system.


I see you are not pointing to a configuration file. This file tells SAS where to look for the HDFS and MapRed components. Perhaps this is the issue.


You will probably face the same issue if you use the filename to Hadoop, which emphasizes the fact that it is an HDFS connectivity issue and not Hive -


filename out hadoop '/tmp/' user='sasdemo' pass='Orion123' recfm=v lrecl=32167 dir ;
data _null_;
file out(shoes) ;
put 'write data to shoes file';
run;

438        filename out hadoop '/tmp' cfg='/tmp/richard.cfg'
439        user='sasinst' pass=XXXXXXXXXX recfm=v lrecl=32167 dir debug;
440        data _null_;
441        file out(shoes4) ;
442        put 'write data to shoes file';
443        run;

ERROR: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.DistributedFileSystem


Sorry I cant help further, I am having the exact same issue and have decided to reinstall Hadoop to check if this is the issue.



Contributor
Posts: 30

Re: Issue in SAS integration with Hadoop

RMP thanks for your response.

Please correct me if I am wrong. In libname approach do we need to point to confifg file?

Solution
‎01-02-2014 11:36 AM
SAS Employee
Posts: 203

Re: Issue in SAS integration with Hadoop

Hi Hashim,

The LIBNAME statement doesn't require that you provide a configuration file. You will need to know if the cluster is running Hive or HiveServer2. It it is running HiveServer2 you will need to include SUBPROTOCOL=hive2 on your LIBNAME statement. Getting this wrong results in the LIBNAME statement hanging. You aren't getting that far, yet.

Your JAR files do not match the ones that I use when access CDH 4.3.1. The error message you get from the LIBNAME statement leads me to believe this is the issue. Here is my list of JAR files (SAS_HADOOP_JAR_PATH points to a directory containing these JARs).

guava-11.0.2.jar

hadoop-auth-2.0.0-cdh4.3.1.jar

hadoop-common-2.0.0-cdh4.3.1.jar

hadoop-core-2.0.0-mr1-cdh4.3.1.jar

hadoop-hdfs-2.0.0-cdh4.3.1.jar

hive-exec-0.10.0-cdh4.3.1.jar

hive-jdbc-0.10.0-cdh4.3.1.jar

hive-metastore-0.10.0-cdh4.3.1.jar

hive-service-0.10.0-cdh4.3.1.jar

libfb303-0.9.0.jar

pig-0.11.0-cdh4.3.1.jar

protobuf-java-2.4.0a.jar

Most of the connection issues we see involve incorrect JAR files, Kerberos security (supported by SAS 9.4 only), and MapReduce2 running on the cluster. We periodically see folks have configuration issues when running HDFS on a separate machine.

Contributor
Posts: 30

Re: Issue in SAS integration with Hadoop

The issue is fixed. It was because of the wrong JAR files. We used the below Jar files and everything is working fine.

guava-11.0.2.jar

hadoop-auth-2.0.0-cdh4.3.1.jar

hadoop-common-2.0.0-cdh4.3.1.jar

hadoop-core-2.0.0-mr1-cdh4.3.1.jar

hadoop-hdfs-2.0.0-cdh4.3.1.jar

hive-exec-0.10.0-cdh4.3.1.jar

hive-jdbc-0.10.0-cdh4.3.1.jar

hive-metastore-0.10.0-cdh4.3.1.jar

hive-service-0.10.0-cdh4.3.1.jar

libfb303-0.9.0.jar

pig-0.11.0-cdh4.3.1.jar

protobuf-java-2.4.0a.jar

SAS Employee
Posts: 203

Re: Issue in SAS integration with Hadoop

Hi Hashim,

I am very happy you have this sorted-out. Have fun!

Best wishes,

Jeff

Occasional Contributor
Posts: 11

Re: Issue in SAS integration with Hadoop

Hi,

I am also getting the similar issue.

The detailed issue is tracked in

ERROR: Error moving data from Hadoop to Hive (LOAD DATA failed).

Please help me out in sorting out the issue.

Thanks,

Sangramjit

Contributor
Posts: 41

Re: Issue in SAS integration with Hadoop

Hi, i am trying to connect to Hortonworks using SAS/Access for Hadoop. I compared the jar files and they look good (versions slightly different as they were provided by the Hadoop admin). I am able to get a kilist from the SAS Client machine (server) and even ssh to the nodes. But when i run a libname from SAS EG, i am getting this:

 

libname zhdplib hadoop subprotocol='hive2' server='myserver' schema=myschema user='user123' pwd='test123';

 

ERROR: Unable to connect to the Hive server.
ERROR: Error trying to establish connection.
ERROR: Error in the LIBNAME statement.

 

 

thanks, 

Alex

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 5263 views
  • 0 likes
  • 7 in conversation