Part 2 of 4: Configuring Copy to Hadoop and Copy from Hadoop directives to update the file hive-site.xml

Advanced topic: Use this article to configure SAS Data Loader for Hadoop (Trial Edition) to use the directives Copy Data to Hadoop and Copy Data from Hadoop. You need this information only if you want to access relational DBMS data; the configuration is not required when accessing SAS data. To complete the configuration process, see also these articles:

Part 1 of 4: Configuring Copy to Hadoop and Copy from Hadoop directives to update the file oozie-sit...
Part 3 of 4: Configuring Copy to Hadoop and Copy from Hadoop directives to add JDBC access to RDBMS ...
Part 4 of 4: Configuring Copy to Hadoop and Copy from Hadoop directives to access data when using Cl...

Before you start, make sure that SAS Data Loader (Trial Edition) and your Hadoop cluster are installed, configured, and functional.

Beginning with Hive 13, when querying column names, the result set is prepended with the table name by default (table-name.column-name). This setting causes the Copy Data from Hadoop directive to fail as most DBMS systems do not accept dot characters ( . ) in column names. Follow these steps to address this issue:

1. For Hortonworks:

a. Use Ambari to update this setting. If you have not already enabled Ambarion your cluster, then follow these steps:

i. Locate the HW VM IP address. The address is displayed in the VM when you start the VM.

ii. Go to a browser window and type in:

http://your-hw-ip-address:8000

Replace your_hw_ip_address with the IP address of your Hortonworks virtual machine.

iii. Click Enable next to the Ambari entry in the Hortwonworks Sandbox page.

iv. Wait a few minutes for Ambari to start.

b. Go to a browser window and type in:

http://your-hw-ip-address:8080

Replace your_hw_ip_address with the IP address of your Hortonworks virtual machine.

c. Login to Ambari with the following credentials: user: admin, password: admin

d. Select the service Hive and click the Configs tab.

e. Expand the section Custom hive-site.

f. Click Add Property.

g. Fill in the property window with this information:

Key: hive.resultset.use.unique.column.names

Value: false

h. Click Add to display content that is similar to this example:

i. Click Save to save your changes and proceed through the various dialogs to save the change.

j. If the Restart button appears, click it. Otherwise, click Service Actions at the top of the Ambari window and select Restart All to pick up the change.

k. Click through the various save and restart windows to complete the action.

l. Wait until all background operations have completed and Hive has restarted.

2. 2. For Cloudera when using Cloudera Manager:

a. Start Cloudera Manager and login with the following credentials: user: cloudera, password: cloudera

b. Click the hive link on the left side of the screen to select the hive service.

c.     Click Configuration at the top of the screen.

d.    Select Actions -> Switch to the new layout.

e. Scroll down the Category list. Under GROUP, select Advanced.

f. In the text field Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml, type the following:

<name>hive.resultset.use.unique.column.names</name>

<value>false</value>

</property>

Tip: Be sure to copy the text with exact spelling and punctuation.

g. Click Save Changes.

h. Click Status.

i. Select Actions -> Restart Service.

j. In the Restart window, click the Restart button.

k. After the completion of processing, close the Command Details windows.

3. 3, For Cloudera without using ClouderaManager:

a. In the Cloudera VM window click Applications -> System Tools -> Terminal to open a terminal window or use the one you have already opened.

b. In the terminal window, type:

cd /etc/hive/conf

c. To edit hive-site.xml, type:

sudo vi hive-site.xml

d. Use the arrow keys to locate this line:

e. Click at the end of the line to position the cursor, as shown.

<configuration>Cursor-Here

f. Type i to go into insert mode.

g. Type Return or Enter twice to add two new lines to the file.

h. Type the following lines. Click Return or Enter at the end of each line:

<name>hive.resultset.use.unique.column.names</name>

<value>false</value>

</property>

i. Type Return or Enter one more time.

j. To leave insert mode, type the Escape key Esc.

k. To restart the hive service to pick up your changes, enter: sudo service hive-server2 restart

Wait for the service to restart.

l. To ensure that the hive service restarted successfully, type: sudo service hive-server2 status

The response should be running. If you receive a different response, check your previous text entries, restart the hive service, and check the status of the server again.

4. 4. Validate the configuration of the cluster:

To validate that this change was applied correctly, follow these steps to view a table in Hadoop using SAS Data Loader for Hadoop:

a. Open SAS Data Loader in a browser.

b. Click Copy Data to Hadoop.

c. Click SAS Server.

d. Click Sample Data.

e. Click the CARS table.

f. Click the View Table icon to open the table in the SAS Table Viewer.

g. If you see column names such as CARS.MAKE in the table viewer, then the setting is not configured correctly. Follow the instructions above to check and restart the hive service to ensure that this setting is configured correctly.

Part 2 of 4: Configuring Copy to Hadoop and Copy from Hadoop directives to update the file hive-site.xml

Free course: Data Literacy Essentials

Get Started