Advanced topic: Use this article to configure SAS Data Loader for Hadoop (Trial Edition) to use the directives Copy Data to Hadoop and Copy Data from Hadoop. You need this information only if you want to access relational DBMS data; the configuration is not required when accessing SAS data. To complete the configuration process, see also these articles:
Before you start, make sure that SAS Data Loader (Trial Edition) and your Hadoop cluster are installed, configured, and functional.
The Hadoop cluster uses JDBC drivers to connect to databases. You need to install one driver for each database on the Hadoop cluster and on the vApp. Most DBMS vendors supply JDBC drivers for their databases that can be downloaded from the vendor’s website. Once you have downloaded the JDBC drivers for the DBMS systems that you want to use, make sure they are accessible from your host system. Then follow these steps to install the JDBC drivers on the Hadoop cluster and on the vApp.
Tip: You can install one or more JDBC driver using these steps. Simply place all of the JDBC drivers that you want to install in the same directory when you follow the steps. In the examples below, we copied the JDBC drivers into the /tmp directory on the cluster.
Special Note: Multimode databases, such as Oracle Exadata, require Hortonworksto communicate with multiple nodes. SQOOP fails on the cluster because it cannot find those nodes. The Hadoop virtual machine must be configured to resolve all nodes. This advanced configuration is not documented here. Consult the Hortonworks documentation for additional information.
1. Copy the JDBC drivers into your Hadoop cluster.
a. On an Apple MAC:
i. At the Launchpad, in the Search entry window, enter terminaland press the Enter key.
ii. If you are using Cloudera, enter the following text into the terminal window:
scp /full-path-to-your-JDBC-jar-files/*.jar
cloudera@your-cluster-ip-address😕tmp
Enter this password: cloudera
full-path-to-your-JDBC-jar-files is the full path to your driver files on your host system.
your-cluster-ip-address is the numeric IP address of your Hadoop cluster.
iii. If you are using Hortonworks, enter the following text into the terminal window:
scp /full-path-to-your-JDBC-jar-files/*.jar
root@your-cluster-ip-address😕tmp
Enter this password: hadoop
b. On a Windows machine:
i. Launch a copy facility such as WinSCP (see http://winscp.net/ ).
ii. For Cloudera, enter the userid cloudera and the password cloudera,
iii. For Hortonworks, enter the userid root and the password hadoop,
iv. Locate the JDBC drivers on your host system and copy them to the /tmp directory on the cluster system.
2. Login to your cluster and identify the JDBC drivers to the Hadoop file system (HDFS).
a. For the Cloudera QuickStart VM, without Cloudera Manager running:
i. Go the QuickStart VM and select Applications -> System Tools -> Terminal to open a terminal window. Or you can use the terminal window that you opened earlier when you copied in the JDBC files.
ii. Click Open to open a terminal window.
iii. Enter the userid cloudera and the password cloudera,
iv. In the cluster terminal windows enter: cd /tmp
v. Now enter: ls to display a list of the JDBC driver files.
vi. To enable others to access the JDBC drivers, enter: chmod 777 *.jar
vii. Enter: export OOZIE_URL=http://localhost:11000/oozie
viii. Enter: oozie admin -shareliblistsqoop
ix. Make note of the path that is displayed in the console, as highlighted in this example:
hdfs://…/user/oozie/share/lib/lib_20141216191206/sqoop/xz.jar
x. Enter the following command:
hadoop fs -put /tmp/*.jar oozie-path-from-above
For example:
hadoop fs -put /tmp/*.jar
/user/oozie/share/lib/lib_20141216191206/sqoop
xi. Enter: oozie admin –sharelibupdate
xii. The update is successful when the console returns the value null.
b. For Cloudera with Cloudera Manager:
i. Go the QuickStart virtual machine and select Applications -> System Tools -> Terminal to open a terminal window. Or you can use the terminal window that you opened earlier when you copied in the JDBC drivers.
ii. Click Open to open a terminal window.
iii. Enter the userid root and the password cloudera,
iv. If the If the user that displays in the terminal window is not root@quickstart, enter: su – root to get to root user.
v. In the cluster terminal windows enter: cd /tmp
vi. Enter: ls to display a list of the JDBC driver files.
vii. To make the JDBC drivers available to others, enter: chmod 777 *.jar
viii. Enter: su – hdfs
ix. Enter: hadoop fs -ls /user/oozie/share/lib
x. In the console, make note of the following highlighted path:
Found 1 items
drwxr-xr-x - oozie oozie 0 2014-10-12 10:00
/user/oozie/share/lib/lib_20141012100037
xi. Enter the following command:
hadoop fs -put /tmp/*.jar oozie-path-from-above
For example:
hadoop fs -put /tmp/*.jar
/user/oozie/share/lib/lib_20141012100037
xii. To pick up the new JDBC drivers, restart the oozie service. Enter: exit
xiii. Enter: sudo service oozie start
xiv. If a message indicates that the service is already started, enter:
sudo service oozierestart
c. For the Hortonworks virtual machine:
i. On an Apple Mac:
1. In the terminal window that you opened earlier, login to the cluster. Enter:
ssh root@your_hw_ip_address
your_hw_ip_address is the numeric IP address that was displayed when you launched the Hortonworksvirtual machine.
2. Enter the password: hadoop
ii. On a Windows machine:
1. In a terminal application, in the Host Name or IP Address field, enter the IP address of your Hortonworks virtual machine.
2. Click Open.
3. Enter the userid root and password hadoop to login to the Hortonworkscluster.
iii. In the cluster terminal window, enter: cd /tmp
iv. Now enter: ls to list the JDBC driver files.
v. To make the JDBC drivers available to others, enter: chmod 777 *.jar
vi. Enter: su – oozie
vii. Enter: export OOZIE_URL=http://localhost:11000/oozie
viii. Enter: oozie admin -shareliblistsqoop
ix. Make note of the path that appears on screen, as highlighted in this example:
hdfs://…/user/oozie/share/lib/lib_20141216191206/sqoop/xz.jar
x. Enter:
hadoop fs -put /tmp/*.jar /user/oozie/oozie-path-from-above
For example:
hadoop fs -put /tmp/*.jar
/user/oozie/share/lib/ lib_20141216191206/sqoop/
xi. Enter: oozie admin –sharelibupdate
xii. When the console window returns null, the update is successfully completed.
3. 3. Install JDBC drivers on the SAS Data Loader vApp:
a. Locate the JDBCDrivers directory of the vApp. The path format is as follows:
sas-data-loader-home\shared-folder\JDBCDrivers
Example:
C:\Program Files\SAS Data Loader\version\SASWorkspace\JDBCDrivers
b. Copy all of the JDBC drivers from your Hadoop cluster to the JDBCDriversfolder.
c. Restart the vApp to pick up the JDBC drivers.
d. In a browser, open SAS Data Loader using the URL that is displayed in the vApp:
http://ip-address-from-vapp/SASDataLoader
e. Click the configuration icon to open the Configuration window.
f. In the Configuration window, click Databases on the left.
g. Click the plus icon to add a new database connection.
h. In the Database Configuration window, add values to the field. The following example applies to one type of database. For assistance, contact your database administrator.
Note: in the Host field, enter the IP address of the database server.
i. Click Test Connection to verify your entries.
j. Click OK to save your entries and return to the Configuration window.
k. Add more database connections as needed.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.