I am trying to integrate SAS with Hadoop. The details are below. SAS interface to Hive 9.4
Linux 2.6.32-358.2.1.el6.x86_64 (LIN X64) platform. When I am trying to create a hive table using SAS I am getting the error below
libname hdplib hadoop server="XXXXXX" user="XXX" password="$XXXX" port=10001;
data hdplib.class;
set sashelp.class(obs=10);
run;
HADOOP_10: Prepared: on connection 2 37 1387861396 no_name 0 DATASTEP
SHOW TABLE EXTENDED LIKE `CLASS` 38 1387861396 no_name 0 DATASTEP
39 1387861396 no_name 0 DATASTEP
40 1387861397 no_name 0 DATASTEP
HADOOP_11: Executed: on connection 2 41 1387861397 no_name 0 DATASTEP
CREATE TABLE `CLASS` (`Name` STRING,`Sex` STRING,`Age` DOUBLE,`Height` DOUBLE,`Weight` DOUBLE) ROW FORMAT DELIMITED FIELDS
TERMINATED BY '\001' LINES TERMINATED BY '\012' STORED AS TEXTFILE TBLPROPERTIES ('SAS OS Name'='Linux','SAS
Version'='9.04.01M0P06192013','SASFMT:Name'='CHAR(8)','SASFMT:Sex'='CHAR(1)') 42 1387861397 no_name 0 DATASTEP
43 1387861397 no_name 0 DATASTEP
ERROR: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.DistributedFileSystem
NOTE: Validate the contents of the Hadoop configuration file and ensure user permissions are correct.
ERROR: Unable to create stream service from /tmp/sasdata-2013-12-24-00-03-17-513-e-00001.dlv. Use the debug option for more
information.
ERROR: Unable to create stream service from /tmp/sasdata-2013-12-24-00-03-17-513-e-00001.dlv. Use the debug option for more
information.
Can someone assist in resolving this issue?
Hi Hashim,
The LIBNAME statement doesn't require that you provide a configuration file. You will need to know if the cluster is running Hive or HiveServer2. It it is running HiveServer2 you will need to include SUBPROTOCOL=hive2 on your LIBNAME statement. Getting this wrong results in the LIBNAME statement hanging. You aren't getting that far, yet.
Your JAR files do not match the ones that I use when access CDH 4.3.1. The error message you get from the LIBNAME statement leads me to believe this is the issue. Here is my list of JAR files (SAS_HADOOP_JAR_PATH points to a directory containing these JARs).
guava-11.0.2.jar
hadoop-auth-2.0.0-cdh4.3.1.jar
hadoop-common-2.0.0-cdh4.3.1.jar
hadoop-core-2.0.0-mr1-cdh4.3.1.jar
hadoop-hdfs-2.0.0-cdh4.3.1.jar
hive-exec-0.10.0-cdh4.3.1.jar
hive-jdbc-0.10.0-cdh4.3.1.jar
hive-metastore-0.10.0-cdh4.3.1.jar
hive-service-0.10.0-cdh4.3.1.jar
libfb303-0.9.0.jar
pig-0.11.0-cdh4.3.1.jar
protobuf-java-2.4.0a.jar
Most of the connection issues we see involve incorrect JAR files, Kerberos security (supported by SAS 9.4 only), and MapReduce2 running on the cluster. We periodically see folks have configuration issues when running HDFS on a separate machine.
Hi,
It looks like installation and configuration issue. I would check with the local Administrator first. I guess you might get more and better responses if you could run:
proc javainfo picklist 'hadoop/hdoopsasjars.txt';
run;
and post the results to identify if there are any missing JAR files.
Thanks,
Naveen
Naveen,
Here is the result.
file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/sas.hadoop.hivehelper_904000.0.0.20130522190000_v940/sas.hadoop.hivehel
per.jar
file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/Log4J_1.2.15.0_SAS_20121211183158/log4j.jar
file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/commons_beanutils_1.8.2.0_SAS_20121211183319/commons-beanutils.jar
file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/commons_collections_3.2.1.0_SAS_20121211183225/commons-collections.jar
file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/commons_logging_1.1.1.0_SAS_20121211183202/commons-logging.jar
file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/jackson_1.9.7.0_SAS_20121211183158/jackson.jar
file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/slf4j_1.5.10.0_SAS_20121211183229/slf4j-api.jar
file:/home/SASHome/SASVersionedJarRepository/eclipse/plugins/slf4j_1.5.10.0_SAS_20121211183229/slf4j-log4j12.jar
Total URLs: 8
Is there anything missing?
Also the Jar files being used are
The JAR files are
guava.jar
hadoop-auth-0.23.1.jar
hadoop-common.jar
hadoop-core.jar
hadoop-hdfs-0.23.5.jar
hadoop-streaming-2.0.0-mr1-cdh4.3.1.jar
hive-exec-0.10.0.jar
hive-jdbc-0.10.0.jar
hive-metastore-0.10.0.jar
hive-service-0.8.1.jar
libfb303-0.7.0.jar
pig.jar
protobuf-java-2.4.1.jar
Hashim,
While the SAS Tech Support are away on Holiday, I would suggest you review the following Webcasts, they may shed some light on how to get around your issue.
- Getting Started with SAS® and Hadoop
In this live webinar, SAS technical expert Jeff Bailey covers the basics of SAS and Hadoop. This has a section talking about configuring SAS Access to Hadoop.
- SAS® Integration with Hadoop: Part II
Tune in for part two of our series on Hadoop – learning more about SAS integration with Hadoop
Hope this helps,
Ahmed
Thanks Ahmed.
We have the connection established between SAS and Hadoop.
libname hdplib hadoop server="CSDFFGF" user="basheerh" password=XXXXXXXXXXXXX port=10001;
NOTE: Libref HDPLIB was successfully assigned as follows:
It seems its an issue with the configuration
Well actually what you have done is made a connection to the Hive server and not to HDFS. I think the problem lies in the fact that SAS cannot connect to HDFS and hence the the original error. When you run the original code
libname hdplib hadoop server="XXXXXX" user="XXX" password="$XXXX" port=10001;
data hdplib.class;
set sashelp.class(obs=10);
run;
I think you will see that the temp file is being written to the local file system and not to HDFS. When HIVE attempts to move the temp file from HDFS to Hive it fails since the file is not available on the HDFS file system.
I see you are not pointing to a configuration file. This file tells SAS where to look for the HDFS and MapRed components. Perhaps this is the issue.
You will probably face the same issue if you use the filename to Hadoop, which emphasizes the fact that it is an HDFS connectivity issue and not Hive -
438 filename out hadoop '/tmp' cfg='/tmp/richard.cfg'
439 user='sasinst' pass=XXXXXXXXXX recfm=v lrecl=32167 dir debug;
440 data _null_;
441 file out(shoes4) ;
442 put 'write data to shoes file';
443 run;
ERROR: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.DistributedFileSystem
Sorry I cant help further, I am having the exact same issue and have decided to reinstall Hadoop to check if this is the issue.
RMP thanks for your response.
Please correct me if I am wrong. In libname approach do we need to point to confifg file?
Hi Hashim,
The LIBNAME statement doesn't require that you provide a configuration file. You will need to know if the cluster is running Hive or HiveServer2. It it is running HiveServer2 you will need to include SUBPROTOCOL=hive2 on your LIBNAME statement. Getting this wrong results in the LIBNAME statement hanging. You aren't getting that far, yet.
Your JAR files do not match the ones that I use when access CDH 4.3.1. The error message you get from the LIBNAME statement leads me to believe this is the issue. Here is my list of JAR files (SAS_HADOOP_JAR_PATH points to a directory containing these JARs).
guava-11.0.2.jar
hadoop-auth-2.0.0-cdh4.3.1.jar
hadoop-common-2.0.0-cdh4.3.1.jar
hadoop-core-2.0.0-mr1-cdh4.3.1.jar
hadoop-hdfs-2.0.0-cdh4.3.1.jar
hive-exec-0.10.0-cdh4.3.1.jar
hive-jdbc-0.10.0-cdh4.3.1.jar
hive-metastore-0.10.0-cdh4.3.1.jar
hive-service-0.10.0-cdh4.3.1.jar
libfb303-0.9.0.jar
pig-0.11.0-cdh4.3.1.jar
protobuf-java-2.4.0a.jar
Most of the connection issues we see involve incorrect JAR files, Kerberos security (supported by SAS 9.4 only), and MapReduce2 running on the cluster. We periodically see folks have configuration issues when running HDFS on a separate machine.
The issue is fixed. It was because of the wrong JAR files. We used the below Jar files and everything is working fine.
guava-11.0.2.jar
hadoop-auth-2.0.0-cdh4.3.1.jar
hadoop-common-2.0.0-cdh4.3.1.jar
hadoop-core-2.0.0-mr1-cdh4.3.1.jar
hadoop-hdfs-2.0.0-cdh4.3.1.jar
hive-exec-0.10.0-cdh4.3.1.jar
hive-jdbc-0.10.0-cdh4.3.1.jar
hive-metastore-0.10.0-cdh4.3.1.jar
hive-service-0.10.0-cdh4.3.1.jar
libfb303-0.9.0.jar
pig-0.11.0-cdh4.3.1.jar
protobuf-java-2.4.0a.jar
Hi Hashim,
I am very happy you have this sorted-out. Have fun!
Best wishes,
Jeff
Hi,
I am also getting the similar issue.
The detailed issue is tracked in
Please help me out in sorting out the issue.
Thanks,
Sangramjit
Hi, i am trying to connect to Hortonworks using SAS/Access for Hadoop. I compared the jar files and they look good (versions slightly different as they were provided by the Hadoop admin). I am able to get a kilist from the SAS Client machine (server) and even ssh to the nodes. But when i run a libname from SAS EG, i am getting this:
libname zhdplib hadoop subprotocol='hive2' server='myserver' schema=myschema user='user123' pwd='test123';
ERROR: Unable to connect to the Hive server.
ERROR: Error trying to establish connection.
ERROR: Error in the LIBNAME statement.
thanks,
Alex
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.