We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Part 3 of 4: Configuring Copy to Hadoop and Copy from Hadoop directives to add JDBC access to RDBMS data

by SAS Employee kehous on ‎02-15-2015 07:03 PM - edited on ‎10-05-2015 03:54 PM by Community Manager (1,593 Views)

Advanced topic:  Use this article to configure SAS Data Loader for Hadoop (Trial Edition) to use the directives Copy Data to Hadoop and Copy Data from Hadoop. You need this information only if you want to access relational DBMS data; the configuration is not required when accessing SAS data. To complete the configuration process, see also these articles:

 

Before you start, make sure that SAS Data Loader (Trial Edition) and your Hadoop cluster are installed, configured, and functional.

 

The Hadoop cluster uses JDBC drivers to connect to databases. You need to install one driver for each database on the Hadoop cluster and on the vApp. Most DBMS vendors supply JDBC drivers for their databases that can be downloaded from the vendor’s website. Once you have downloaded the JDBC drivers for the DBMS systems that you want to use, make sure they are accessible from your host system. Then follow these steps to install the JDBC drivers on the Hadoop cluster and on the vApp.   

Tip: You can install one or more JDBC driver using these steps. Simply place all of the JDBC drivers that you want to install in the same directory when you follow the steps. In the examples below, we copied the JDBC drivers into the
/tmp directory on the cluster. 

Special Note: Multimode databases, such as Oracle Exadata, require Hortonworksto communicate with multiple nodes. SQOOP fails on the cluster because it cannot find those nodes. The Hadoop virtual machine must be configured to resolve all nodes. This advanced configuration is not documented here. Consult the Hortonworks documentation for additional information. 

1.     Copy the JDBC drivers into your Hadoop cluster.

a.     On an Apple MAC:

                                          i.    At the Launchpad, in the Search entry window, enter terminaland press the Enter key.

                                         ii.    If you are using Cloudera, enter the following text into the terminal window:

scp /full-path-to-your-JDBC-jar-files/*.jar
      cloudera@your-cluster-ip-address:/tmp 

Enter this password: cloudera

full-path-to-your-JDBC-jar-files is the full path to your driver files on your host system.

your-cluster-ip-address  is the numeric IP address of your Hadoop cluster.

                                        iii.    If you are using Hortonworks, enter the following text into the terminal window:

scp /full-path-to-your-JDBC-jar-files/*.jar
      root@your-cluster-ip-address:/tmp 

Enter this password: hadoop

b.    On a Windows machine:

                                          i.    Launch a copy facility such as WinSCP (see http://winscp.net/ ).

                                         ii.    For Cloudera, enter the userid cloudera and the password cloudera,

                                        iii.    For Hortonworks, enter the userid root and the password hadoop,

                                        iv.    Locate the JDBC drivers on your host system and copy them to the /tmp directory on the cluster system.

     2.     Login to your cluster and identify the JDBC drivers to the Hadoop file system (HDFS). 

a.     For the Cloudera QuickStart VM, without Cloudera Manager running:

                                          i.    Go the QuickStart VM and select Applications -> System Tools -> Terminal to open a terminal window. Or you can use the terminal window that you opened earlier when you copied in the JDBC files.

                                         ii.    Click Open to open a terminal window.

                                        iii.    Enter the userid cloudera and the password cloudera,

                                        iv.    In the cluster terminal windows enter: cd /tmp

                                         v.    Now enter: ls to display a list of the JDBC driver files.

                                        vi.    To enable others to access the JDBC drivers, enter: chmod 777 *.jar

                                       vii.    Enter: export OOZIE_URL=http://localhost:11000/oozie

                                      viii.    Enter: oozie admin -shareliblistsqoop

                                        ix.    Make note of the path that is displayed in the console, as highlighted in this example:

hdfs://…/user/oozie/share/lib/lib_20141216191206/sqoop/xz.jar

                                         x.    Enter the following command:

hadoop fs -put /tmp/*.jar oozie-path-from-above

For example:

hadoop fs -put /tmp/*.jar
           /user/oozie/share/lib/lib_20141216191206/sqoop

                                        xi.    Enter: oozie admin –sharelibupdate

                                       xii.    The update is successful when the console returns the value null.

b.    For Cloudera with Cloudera Manager:

                                          i.    Go the QuickStart virtual machine and select Applications -> System Tools -> Terminal to open a terminal window. Or you can use the terminal window that you opened earlier when you copied in the JDBC drivers.

                                         ii.    Click Open to open a terminal window.

                                        iii.    Enter the userid root and the password cloudera,

                                        iv.    If the If the user that displays in the terminal window is not root@quickstart, enter: su – root to get to root user. 

                                         v.    In the cluster terminal windows enter: cd /tmp

                                        vi.    Enter: ls to display a list of the JDBC driver files.

                                       vii.    To make the JDBC drivers available to others, enter: chmod 777 *.jar

                                      viii.    Enter: su – hdfs

                                        ix.    Enter: hadoop fs -ls /user/oozie/share/lib

                                         x.    In the console, make note of the following highlighted path:

Found 1 items
drwxr-xr-x   - oozie oozie          0 2014-10-12 10:00
/user/oozie/share/lib/lib_20141012100037

                                        xi.    Enter the following command:

hadoop fs -put /tmp/*.jar oozie-path-from-above

For example:

hadoop fs -put /tmp/*.jar
          
/user/oozie/share/lib/lib_20141012100037

                                       xii.    To pick up the new JDBC drivers, restart the oozie service. Enter: exit

                                      xiii.    Enter: sudo service oozie start  

                                     xiv.    If a message indicates that the service is already started, enter:

sudo service oozierestart

c.     For the Hortonworks virtual machine:

                                          i.    On an Apple Mac:

1.     In the terminal window that you opened earlier, login to the cluster. Enter:

ssh root@your_hw_ip_address

your_hw_ip_address is the numeric IP address that was displayed when you launched the Hortonworksvirtual machine.

2.     Enter the password: hadoop

                                         ii.    On a Windows machine:

1.     In a terminal application, in the Host Name or IP Address field, enter the IP address of your Hortonworks virtual machine.

2.     Click Open.

3.     Enter the userid root and password hadoop to login to the Hortonworkscluster. 

                                        iii.    In the cluster terminal window, enter: cd /tmp

                                        iv.    Now enter: ls to list the JDBC driver files.

                                         v.    To make the JDBC drivers available to others, enter: chmod 777 *.jar

                                        vi.    Enter: su – oozie

                                       vii.    Enter: export OOZIE_URL=http://localhost:11000/oozie

                                      viii.    Enter: oozie admin -shareliblistsqoop 

                                        ix.    Make note of the path that appears on screen, as highlighted in this example:

hdfs://…/user/oozie/share/lib/lib_20141216191206/sqoop/xz.jar

                                         x.    Enter:

  hadoop fs -put /tmp/*.jar /user/oozie/oozie-path-from-above

For example:

hadoop fs -put /tmp/*.jar
        /user/oozie/share/lib/ lib_20141216191206/sqoop/

                                        xi.    Enter: oozie admin –sharelibupdate 

                                       xii.    When the console window returns null, the update is successfully completed.

3.         3.     Install JDBC drivers on the SAS Data Loader vApp:

a.     Locate the JDBCDrivers directory of the vApp. The path format is as follows:

sas-data-loader-home\shared-folder\JDBCDrivers

Example:

C:\Program Files\SAS Data Loader\version\SASWorkspace\JDBCDrivers

b.    Copy all of the JDBC drivers from your Hadoop cluster to the JDBCDriversfolder.

c.     Restart the vApp to pick up the JDBC drivers.

d.    In a browser, open SAS Data Loader using the URL that is displayed in the vApp:

http://ip-address-from-vapp/SASDataLoader

e.     Click the configuration icon to open the Configuration window.

OozieSite01ConfigIcon.png

                             

f.     In the Configuration window, click Databases on the left.

g.    Click the plus icon to add a new database connection.

 

OozieSite02AddIcon.png

 

h.     In the Database Configuration window, add values to the field. The following example applies to one type of database. For assistance, contact your database administrator.


OozieSite03DBConns.png

 

Note: in the Host field, enter the IP address of the database server.

i.      Click Test Connection to verify your entries.

j.      Click OK to save your entries and return to the Configuration window.

k.     Add more database connections as needed.

 

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.