BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MALIGHT
Fluorite | Level 6

We are using SAS 9.4TS1M2 and Hortonworks 2.5 Hadoop cluster.  Our libname stmt is working:

 

24         options set=SAS_HADOOP_JAR_PATH ="/sas/sashome/SAS_HADOOP_JAR_PATH";
25         
26         libname hdp hadoop server='hwnode-05.hdp.ateb.com' port=10000
26       ! schema=patient user=hive password=XXXX SUBPROTOCOL=hive2;
NOTE: Libref HDP was successfully assigned as follows:
      Engine:        HADOOP
      Physical Name: jdbc:hive2://hwnode-05.hdp.ateb.com:10000/patient

 

But Proc SQL is not reading from Hadoop:

 

24         proc sql;
25           select * from hdp.atebpatient where clientid=4 and
25       ! atebpatientid=xxxxxxxxxxx;

 

One user gets these errors:

ERROR: javax.xml.parsers.FactoryConfigurationError: Provider org.apache.xerces.jaxp.DocumentBuilderFactoryImpl not found
ERROR:  at javax.xml.parsers.DocumentBuilderFactory.newInstance(DocumentBuilderFactory.java:127)
ERROR:  at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2555)
ERROR:  at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2532)
ERROR:  at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2424)
ERROR:  at org.apache.hadoop.conf.Configuration.set(Configuration.java:1149)
ERROR:  at org.apache.hadoop.conf.Configuration.set(Configuration.java:1121)
ERROR:  at org.apache.hadoop.conf.Configuration.setBoolean(Configuration.java:1457)
ERROR:  at com.dataflux.hadoop.DFConfiguration.<init>(DFConfiguration.java:95)
ERROR:  at com.dataflux.hadoop.DFConfiguration.<init>(DFConfiguration.java:72)
ERROR: Caused by: java.lang.ClassNotFoundException: org.apache.xerces.jaxp.DocumentBuilderFactoryImpl

 

while another user receives:
ERROR: java.lang.NoClassDefFoundError: org/w3c/dom/ElementTraversal
ERROR:  at java.lang.ClassLoader.defineClass1(Native Method)
ERROR:  at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
ERROR:  at
       java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
ERROR:  at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
ERROR:  at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
ERROR:  at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

 

We suspect we are missing or have incorrect .jar file(s). The list of .jar files we are using are

attached.

 

 


Jar Files for Hadoop.PNG
1 ACCEPTED SOLUTION

Accepted Solutions
JBailey
Barite | Level 11

Hi @MALIGHT,

 

READ_METHOD=jdbc bypasses HDFS processing. This is an indication that you are missing some JARs.

 

You may want to try setting SAS_HADOOP_RESTFUL=1 environment variable. I think this option first appeared in SAS 9.4M3, but I could be wrong. This option enables HDFS processing to use the REST interface. This limits the need for certain JARs, too. If you set it and don't see a change in behavior it is probably because it wasn't implemented at M2.

 

I think your best bet is to go through the process of getting the correct set of JARs.

View solution in original post

3 REPLIES 3
JBailey
Barite | Level 11

Hi @MALIGHT

 

Was the hadooptracer program used to pull the JAR files? I don't think there are nearly enough JARs listed in the PNG file to connect to the Hadoop cluster.

 

Is Apache Knox being used in this HDP cluster?

 

Best wishes,

Jeff

MALIGHT
Fluorite | Level 6

Hi Jeff Bailey (from many moons ago testing SAS/ACCESS)!

 

I'm tracking down how we got hold of the JAR files we point to, since I wasn't involved in this end of the process. Knox was not installed as part of our HDP cluster.

After an adjustment on the Hive serve node by our IT department this morning, my original PROC SQL step still failed. I decided to add

the read_method=jdbc option to the Libname statement, and the PROC SQL step now works. From the S/A doc, there is no default for
the read_method= option but if I remove it comepletely or assign read_method=hdfs, it fails with errors as listed in original post.

 

Does read_method=jdbc bypass need to use JAR files?

JBailey
Barite | Level 11

Hi @MALIGHT,

 

READ_METHOD=jdbc bypasses HDFS processing. This is an indication that you are missing some JARs.

 

You may want to try setting SAS_HADOOP_RESTFUL=1 environment variable. I think this option first appeared in SAS 9.4M3, but I could be wrong. This option enables HDFS processing to use the REST interface. This limits the need for certain JARs, too. If you set it and don't see a change in behavior it is probably because it wasn't implemented at M2.

 

I think your best bet is to go through the process of getting the correct set of JARs.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2307 views
  • 0 likes
  • 2 in conversation