SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Copying SQL Server Table To Hadoop Fails

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

Copying SQL Server Table To Hadoop Fails

Here is the log when trying to execute generated code:

Log

Starting execution of Oozie workflow.

Oozie workflow job configuration xml:

<configuration>

   <property><name>oozie.wf.application.path</name><value>/user/sasdemo/SASDataLoader/CopyDatatoHadoopf3b595c6fa45455fb6c2a50ff37633f7</value></property>

   <property><name>jobTracker</name><value>quickstart.cloudera:8032</value></property>

   <property><name>nameNode</name><value>quickstart.cloudera:8020</value></property>

   <property><name>user.name</name><value>sasdemo</value></property>

   <property><name>queueName</name><value>default</value></property>

   <property><name>oozie.use.system.libpath</name><value>true</value></property>

</configuration>

Oozie workflow job start url: 'http://192.168.88.130:11000/oozie/v2/jobs?action=start'

Sending request to start Oozie workflow job.

Error 'org.springframework.web.client.HttpClientErrorException: 404 Not Found' trying to start Oozie Job using using URL 'http://192.168.88.130:11000/oozie/v2/jobs?action=start' and configuration '<configuration>

   <property><name>oozie.wf.application.path</name><value>/user/sasdemo/SASDataLoader/CopyDatatoHadoopf3b595c6fa45455fb6c2a50ff37633f7</value></property>

   <property><name>jobTracker</name><value>quickstart.cloudera:8032</value></property>

   <property><name>nameNode</name><value>quickstart.cloudera:8020</value></property>

   <property><name>user.name</name><value>sasdemo</value></property>

   <property><name>queueName</name><value>default</value></property>

   <property><name>oozie.use.system.libpath</name><value>true</value></property>

</configuration>'.

Execution failed: Error 'org.springframework.web.client.HttpClientErrorException: 404 Not Found' trying to start Oozie Job using using URL 'http://192.168.88.130:11000/oozie/v2/jobs?action=start' and configuration '<configuration>

   <property><name>oozie.wf.application.path</name><value>/user/sasdemo/SASDataLoader/CopyDatatoHadoopf3b595c6fa45455fb6c2a50ff37633f7</value></property>

   <property><name>jobTracker</name><value>quickstart.cloudera:8032</value></property>

   <property><name>nameNode</name><value>quickstart.cloudera:8020</value></property>

   <property><name>user.name</name><value>sasdemo</value></property>

   <property><name>queueName</name><value>default</value></property>

   <property><name>oozie.use.system.libpath</name><value>true</value></property>

</configuration>'. org.springframework.web.client.HttpClientErrorException: 404 Not Found

Execution failed: Error 'org.springframework.web.client.HttpClientErrorException: 404 Not Found' trying to start Oozie Job using using URL 'http://192.168.88.130:11000/oozie/v2/jobs?action=start' and configuration '<configuration>

   <property><name>oozie.wf.application.path</name><value>/user/sasdemo/SASDataLoader/CopyDatatoHadoopf3b595c6fa45455fb6c2a50ff37633f7</value></property>

   <property><name>jobTracker</name><value>quickstart.cloudera:8032</value></property>

   <property><name>nameNode</name><value>quickstart.cloudera:8020</value></property>

   <property><name>user.name</name><value>sasdemo</value></property>

   <property><name>queueName</name><value>default</value></property>

   <property><name>oozie.use.system.libpath</name><value>true</value></property>

</configuration>'. org.springframework.web.client.HttpClientErrorException: 404 Not Found

What might be causing this problem?  I can view a SQL Server table in the SAS Table Viewer, but copy to Hadoop fails.


Accepted Solutions
Solution
‎07-10-2015 03:00 PM
SAS Employee
Posts: 6

Re: Copying SQL Server Table To Hadoop Fails

In a nutshell, we discovered an issue when using Cloudera Manager with the HDFS datanode address that SAS Data Loader received from the Cloudera cluster and found an HDFS configuration option we could activate to remedy the problem.  A more detailed explanation appears below.



After the initial issue with the oozie service not responding (which went away after reinstallation of the VMs), the primary issue we struggled with was Copy To Hadoop was failing during its execution for both SAS datasets and SQL Server tables as sources. 

When copying the SAS dataset, the operation would create the Hive target table with the correct metadata but no data (i.e. empty table).  We confirmed that the HDFS file that backs the Hive table was created, but it had empty contents.  No errors were present in any of the logs we looked at.

When copying from a SQL Server table to Hadoop, we would encounter an error very early in the execution stating “There are 0 datanode(s) running…”.  No Hive table or HDFS file was created.

We discovered that the issue only occurs if Cloudera Manager is enabled.  In the default Quickstart VM configuration, Cloudera Manager is disabled.  The difference in behavior is caused by the fact that the Quickstart VM has two different sets of Hadoop site XML configuration files that it uses depending upon whether we are using the default configuration or the Cloudera Manager configuration.

In the site XML files for the default configuration, the hdfs-site.xml file contains explicit entries with the datanode IP address.  The DataLoader client uses these entries (after the DataLoader VM picks them up during its configuration phase) to locate and communicate with the HDFS datanode.

In the site XML files for the Cloudera Manager configuration, the hdfs-site.xml lacks explicit entries for the datanode.  The DataLoader client queries the Quickstart VM namenode to request the address for the datanode.  Unfortunately, the namenode reports back an address of 127.0.0.1 for the datanode.  The DataLoader client (VM) attempts to connect to HDFS at address 127.0.0.1 and cannot (because 127.0.0.1 is localhost).  Unfortunately, the logs do not readily indicate this connection failure.

The fix is to go into Cloudera Manager and configure the HDFS configuration option to “Use Datanode Hostname” for external clients. This change causes the namenode to report back an address of “cloudera.quickstart” rather than 127.0.0.1 for the datanode.  We have  verified on the customer’s system and our own that this resolves the issue.  We were able to copy both SAS datasets and SQL Server tables into the Quickstart VM. 


We will work on updating the SAS Communities documents that we have for the configuration changes necessary for the Copy to Hadoop operation in Data Loader.


Thank you,

Ben Ryan

View solution in original post


All Replies
SAS Employee
Posts: 6

Re: Copying SQL Server Table To Hadoop Fails

It appears the Oozie service is not running.  Can you bring up Cloudera Manager and check on the status of Oozie?

Thanks,
Ben Ryan

Contributor
Posts: 22

Re: Copying SQL Server Table To Hadoop Fails

Ben,

Here is everything in the Cloudera terminal window after issuing sudo service oozie start :

[root@quickstart tmp]#  sudo service oozie start 


Setting OOZIE_HOME:          /usr/lib/oozie

Sourcing:                    /usr/lib/oozie/bin/oozie-env.sh

  setting JAVA_LIBRARY_PATH="$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native"

  setting OOZIE_DATA=/var/lib/oozie

  setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat

  setting CATALINA_TMPDIR=/var/lib/oozie

  setting CATALINA_PID=/var/run/oozie/oozie.pid

  setting CATALINA_BASE=/var/lib/oozie/tomcat-deployment

  setting OOZIE_HTTPS_PORT=11443

  setting OOZIE_HTTPS_KEYSTORE_PASS=password

  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.port=${OOZIE_HTTPS_PORT}"

  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.keystore.pass=${OOZIE_HTTPS_KEYSTORE_PASS}"

  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"

  setting OOZIE_CONFIG=/etc/oozie/conf

  setting OOZIE_LOG=/var/log/oozie

Using   OOZIE_CONFIG:        /etc/oozie/conf

Sourcing:                    /etc/oozie/conf/oozie-env.sh

  setting JAVA_LIBRARY_PATH="$JAVA_LIBRARY_PATH:/usr/lib/hadoop/lib/native"

  setting OOZIE_DATA=/var/lib/oozie

  setting OOZIE_CATALINA_HOME=/usr/lib/bigtop-tomcat

  setting CATALINA_TMPDIR=/var/lib/oozie

  setting CATALINA_PID=/var/run/oozie/oozie.pid

  setting CATALINA_BASE=/var/lib/oozie/tomcat-deployment

  setting OOZIE_HTTPS_PORT=11443

  setting OOZIE_HTTPS_KEYSTORE_PASS=password

  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.port=${OOZIE_HTTPS_PORT}"

  setting CATALINA_OPTS="$CATALINA_OPTS -Doozie.https.keystore.pass=${OOZIE_HTTPS_KEYSTORE_PASS}"

  setting CATALINA_OPTS="$CATALINA_OPTS -Xmx1024m"

  setting OOZIE_CONFIG=/etc/oozie/conf

  setting OOZIE_LOG=/var/log/oozie

Setting OOZIE_CONFIG_FILE:   oozie-site.xml

Using   OOZIE_DATA:          /var/lib/oozie

Using   OOZIE_LOG:           /var/log/oozie

Setting OOZIE_LOG4J_FILE:    oozie-log4j.properties

Setting OOZIE_LOG4J_RELOAD:  10

Setting OOZIE_HTTP_HOSTNAME: quickstart.cloudera

Setting OOZIE_HTTP_PORT:     11000

Setting OOZIE_ADMIN_PORT:     11001

Using   OOZIE_HTTPS_PORT:     11443

Setting OOZIE_BASE_URL:      http://quickstart.cloudera:11000/oozie

Using   CATALINA_BASE:       /var/lib/oozie/tomcat-deployment

Setting OOZIE_HTTPS_KEYSTORE_FILE:     /var/lib/oozie/.keystore

Using   OOZIE_HTTPS_KEYSTORE_PASS:     password

Setting OOZIE_INSTANCE_ID:       quickstart.cloudera

Setting CATALINA_OUT:        /var/log/oozie/catalina.out

Using   CATALINA_PID:        /var/run/oozie/oozie.pid


Using   CATALINA_OPTS:        -Doozie.https.port=11443 -Doozie.https.keystore.pass=password -Xmx1024m -Doozie.https.port=11443 -Doozie.https.keystore.pass=password -Xmx1024m -Dderby.stream.error.file=/var/log/oozie/derby.log

Adding to CATALINA_OPTS:     -Doozie.home.dir=/usr/lib/oozie -Doozie.config.dir=/etc/oozie/conf -Doozie.log.dir=/var/log/oozie -Doozie.data.dir=/var/lib/oozie -Doozie.instance.id=quickstart.cloudera -Doozie.config.file=oozie-site.xml -Doozie.log4j.file=oozie-log4j.properties -Doozie.log4j.reload=10 -Doozie.http.hostname=quickstart.cloudera -Doozie.admin.port=11001 -Doozie.http.port=11000 -Doozie.https.port=11443 -Doozie.base.url=http://quickstart.cloudera:11000/oozie -Doozie.https.keystore.file=/var/lib/oozie/.keystore -Doozie.https.keystore.pass=password -Djava.library.path=:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native


Using CATALINA_BASE:   /var/lib/oozie/tomcat-deployment

Using CATALINA_HOME:   /usr/lib/bigtop-tomcat

Using CATALINA_TMPDIR: /var/lib/oozie

Using JRE_HOME:        /usr/java/jdk1.7.0_67-cloudera

Using CLASSPATH:       /usr/lib/bigtop-tomcat/bin/bootstrap.jar

Using CATALINA_PID:    /var/run/oozie/oozie.pid

[root@quickstart tmp]#

Based on this, I think the Oozie service is running.

Joe

SAS Employee
Posts: 3

Re: Copying SQL Server Table To Hadoop Fails

Joe,

Can you try hitting http://192.168.88.130:11000/oozie/v2/jobs in a browser to see whether the url can be resolved?

Thanks,

Richard

SAS Employee
Posts: 6

Re: Copying SQL Server Table To Hadoop Fails

Also try hitting: http://quickstart.cloudera:11000/oozie/v2/jobs
If this works, then you might try updating the Oozie configuration in the DataLoader configuration screen to point to http://quickstart.cloudera:11000/oozie  instead of http://192.168.88.130:11000/oozie

SAS Employee
Posts: 6

Re: Copying SQL Server Table To Hadoop Fails

Ignore my last reply.  The oozie URL needs to be the numeric IP address for the DataLoader vApp to be able to resolve it.  Let us know if your browser can reach http://192.168.88.130:11000/oozie/v2/jobs as Richard asked.

Thanks,
Ben

Contributor
Posts: 22

Re: Copying SQL Server Table To Hadoop Fails

Ben,

Is it possible that my problem is caused by a firewall issue?

Joe

SAS Employee
Posts: 6

Re: Copying SQL Server Table To Hadoop Fails

That is always a possibility, but let's check a couple of things first.  I assume the "HTTP Status 404" from response #6 above is in response to trying http://192.168.88.130:11000/oozie/v2/jobs in a browser.

1) Do you get the same response if you try: http://quickstart.cloudera:11000/oozie/v2/jobshttp://192.168.88.130:11000/oozie/v2/jobs?action=start
    Ideally you should get back a JSON object bracketed with curly braces {}

2) If you go to the Cloudera terminal window and enter the following command:
    netstat -an | grep 11000

    what output do you get back?

Contributor
Posts: 22

Re: Copying SQL Server Table To Hadoop Fails

Ben,

Entering

http://quickstart.cloudera:11000/oozie/v2/jobshttp://192.168.88.130:11000/oozie/v2/jobs?action=start

timed out.

[root@quickstart tmp]# netstat -an | grep 11000

tcp        0      0 0.0.0.0:11000               0.0.0.0:*                   LISTEN     

Joe

Contributor
Posts: 22

Re: Copying SQL Server Table To Hadoop Fails

Microsoft Windows [Version 6.2.9200]

(c) 2012 Microsoft Corporation. All rights reserved.

D:\Users\jwhitehurst>ping  http://192.168.88.130:11000

Ping request could not find host http://192.168.88.130:11000. Please check the n

ame and try again.

D:\Users\jwhitehurst>

SAS Employee
Posts: 6

Re: Copying SQL Server Table To Hadoop Fails


A Technical Support track has been opened for this issue and further communication is being routed through that mechanism.  I'll report back to this discussion when a resolution has been found.

Thank you,

Ben Ryan

Contributor
Posts: 22

Re: Copying SQL Server Table To Hadoop Fails

HTTP Status 404 -


type Status report

message

description The requested resource is not available.


Apache Tomcat/6.0.4

SAS Employee
Posts: 203

Re: Copying SQL Server Table To Hadoop Fails

Hi,

I don't know that this will help you, but it is worth a try. By default there are a lot of services running on the Cloudera quickstart. You may want to start Cloudera Manager and stop some of them (HBase, Solr, ks_indexer, Spark, Sentry, and Impala). Cloudera says that their VM will run in 4GB of RAM, but I have had problems with it when everything is running. I always turn things off, increase the RAM to 8GB, and set the number of processors to 2 . 

Another thing to try is to restart the quickstart VM at the end of the day. Let it come-up overnight. That should give everything time to start properly.

Best wishes,

Jeff

Contributor
Posts: 22

Re: Copying SQL Server Table To Hadoop Fails

Jeff,

Thanks for your reply.  I've allocated 16GB of memory and 2 processors.  I think I have a configuration issue that the development team from is try to help me resolve.  When, it's working correctly, I'll post what we did in case it might help someone else.

Joe

Solution
‎07-10-2015 03:00 PM
SAS Employee
Posts: 6

Re: Copying SQL Server Table To Hadoop Fails

In a nutshell, we discovered an issue when using Cloudera Manager with the HDFS datanode address that SAS Data Loader received from the Cloudera cluster and found an HDFS configuration option we could activate to remedy the problem.  A more detailed explanation appears below.



After the initial issue with the oozie service not responding (which went away after reinstallation of the VMs), the primary issue we struggled with was Copy To Hadoop was failing during its execution for both SAS datasets and SQL Server tables as sources. 

When copying the SAS dataset, the operation would create the Hive target table with the correct metadata but no data (i.e. empty table).  We confirmed that the HDFS file that backs the Hive table was created, but it had empty contents.  No errors were present in any of the logs we looked at.

When copying from a SQL Server table to Hadoop, we would encounter an error very early in the execution stating “There are 0 datanode(s) running…”.  No Hive table or HDFS file was created.

We discovered that the issue only occurs if Cloudera Manager is enabled.  In the default Quickstart VM configuration, Cloudera Manager is disabled.  The difference in behavior is caused by the fact that the Quickstart VM has two different sets of Hadoop site XML configuration files that it uses depending upon whether we are using the default configuration or the Cloudera Manager configuration.

In the site XML files for the default configuration, the hdfs-site.xml file contains explicit entries with the datanode IP address.  The DataLoader client uses these entries (after the DataLoader VM picks them up during its configuration phase) to locate and communicate with the HDFS datanode.

In the site XML files for the Cloudera Manager configuration, the hdfs-site.xml lacks explicit entries for the datanode.  The DataLoader client queries the Quickstart VM namenode to request the address for the datanode.  Unfortunately, the namenode reports back an address of 127.0.0.1 for the datanode.  The DataLoader client (VM) attempts to connect to HDFS at address 127.0.0.1 and cannot (because 127.0.0.1 is localhost).  Unfortunately, the logs do not readily indicate this connection failure.

The fix is to go into Cloudera Manager and configure the HDFS configuration option to “Use Datanode Hostname” for external clients. This change causes the namenode to report back an address of “cloudera.quickstart” rather than 127.0.0.1 for the datanode.  We have  verified on the customer’s system and our own that this resolves the issue.  We were able to copy both SAS datasets and SQL Server tables into the Quickstart VM. 


We will work on updating the SAS Communities documents that we have for the configuration changes necessary for the Copy to Hadoop operation in Data Loader.


Thank you,

Ben Ryan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 15 replies
  • 813 views
  • 0 likes
  • 4 in conversation