BookmarkSubscribeRSS Feed

Part 2 of 4: Configuring Copy to Hadoop and Copy from Hadoop directives to update the file hive-site.xml

Started ‎02-15-2015 by
Modified ‎10-05-2015 by
Views 2,024

Advanced topic:  Use this article to configure SAS Data Loader for Hadoop (Trial Edition) to use the directives Copy Data to Hadoop and Copy Data from Hadoop. You need this information only if you want to access relational DBMS data; the configuration is not required when accessing SAS data. To complete the configuration process, see also these articles:

 

Before you start, make sure that SAS Data Loader (Trial Edition) and your Hadoop cluster are installed, configured, and functional.

Beginning with Hive 13, when querying column names, the result set is prepended with the table name by default (table-name.column-name). This setting causes the Copy Data from Hadoop directive to fail as most DBMS systems do not accept dot characters ( . ) in column names. Follow these steps to address this issue:

1.     For Hortonworks:

a.     Use Ambari to update this setting. If you have not already enabled Ambarion your cluster, then follow these steps:

                                          i.    Locate the HW VM IP address. The address is displayed in the VM when you start the VM. 

                                         ii.    Go to a browser window and type in:

http://your-hw-ip-address:8000

Replace your_hw_ip_address with the IP address of your Hortonworks virtual machine.

                                        iii.    Click Enable next to the Ambari entry in the Hortwonworks Sandbox page.   

                                        iv.    Wait a few minutes for Ambari to start.

b.    Go to a browser window and type in:

http://your-hw-ip-address:8080

Replace  your_hw_ip_address with the IP address of your Hortonworks virtual machine.

c.     Login to Ambari with the following credentials: user: admin, password: admin

d.    Select the service Hive and click the Configs tab.

e.     Expand the section Custom hive-site.

f.     Click Add Property.

g.    Fill in the property window with this information:

Key:     hive.resultset.use.unique.column.names

Value:   false

h.     Click Add to display content that is similar to this example:

             HiveSite01AmbariWindow.png

 

i.      Click Save to save your changes and proceed through the various dialogs to save the change.

j.      If the Restart button appears, click it. Otherwise, click Service Actions at the top of the Ambari window and select Restart All to pick up the change. 

k.     Click through the various save and restart windows to complete the action. 

l.      Wait until all background operations have completed and Hive has restarted. 

2.         2.  For Cloudera when using Cloudera Manager:

a.     Start Cloudera Manager and login with the following credentials: user: cloudera, password: cloudera

HiveSite02ClouderaWindow.png

 

b.    Click the hive link on the left side of the screen to select the hive service.

c.     Click Configuration at the top of the screen.
HiveSite03SmallClouderaWindow.png


d.    Select Actions -> Switch to the new layout.
     HiveSite04SwithLayout.png

e.     Scroll down the Category list. Under GROUP, select Advanced.

HiveSite05Advanced.png

f.     In the text field Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml, type the following:

<property>

      <name>hive.resultset.use.unique.column.names</name>

      <value>false</value>

</property>

Tip: Be sure to copy the text with exact spelling and punctuation.

g.    Click Save Changes.

           HiveSite06SaveChanges.png    

h.     Click Status.

i.      Select Actions -> Restart Service.

           HiveSite08Restart.png

j.      In the Restart window, click the Restart button. 

k.     After the completion of processing, close the Command Details windows.

3.          3,  For Cloudera without using ClouderaManager:

                    a.     In the Cloudera VM window click Applications -> System Tools -> Terminal to open a terminal window or use the one you have already opened.

b.    In the terminal window, type:

cd /etc/hive/conf 

c.     To edit hive-site.xml, type:

sudo vi hive-site.xml

d.    Use the arrow keys to locate this line:

<configuration>  

e.     Click at the end of the line to position the cursor, as shown.

<configuration>Cursor-Here      

f.     Type i to go into insert mode. 

g.    Type Return or Enter twice to add two new lines to the file.

h.     Type the following lines. Click Return or Enter at the end of each line:

<property>

      <name>hive.resultset.use.unique.column.names</name>

      <value>false</value>

</property>

i.      Type Return or Enter one more time.

j.      To leave insert mode, type the Escape key Esc.

k.     To restart the hive service to pick up your changes, enter: sudo service hive-server2 restart

 

                           Wait for the service to restart.

l.      To ensure that the hive service restarted successfully, type: sudo service hive-server2 status

The response should be running. If you receive a different response, check your previous text entries, restart the hive service, and check the status of the server again.

 

4.        4.  Validate the configuration of the cluster:

To validate that this change was applied correctly, follow these steps to view a table in Hadoop using SAS Data Loader for Hadoop:

a.     Open SAS Data Loader in a browser.

b.    Click Copy Data to Hadoop.

c.     Click SAS Server.

d.    Click Sample Data.

e.     Click the CARS table.

f.     Click the View Table icon to open the table in the SAS Table Viewer.

     HiveSite09ViewTable.png

     HiveSite10TableViewer.png

g.    If you see column names such as CARS.MAKE in the table viewer, then the setting is not configured correctly. Follow the instructions above to check and restart the hive service to ensure that this setting is configured correctly.

 

 

 

 

Version history
Last update:
‎10-05-2015 03:52 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels