Re: SAS to Hadoop connection using SAS/ACCESS Interface to ODBC

sasprofile · Posted 07-22-2021 06:40 PM

Hi friends

Can any one please advice me with the steps how to install/configure for sas to hadoop connectivity using SAS/ACCESS Interface to ODBC. We have sas 9.4 m6 on linux os.

we have Integrated Windows authentication with kerberos.

need your help here

Thank you all in advance

Sajid01 · Posted 07-24-2021 02:18 PM

Typically one would need to install the odbc client of your version of Hadoop on the SAS application server and configure the odbc driver as per the Hadoop documentation.
Thereafter it will be possible to connect like any other odbc database connection.

sasprofile · Posted 07-24-2021 11:46 PM

Could you please share with any link with all the steps to accomplish this task ?

Sajid01 · Posted 07-25-2021 08:09 AM

Have a look at the following. Your Hadoop vendor may be of help

https://www.progress.com/odbc/apache-hadoop-hive

https://www.cloudera.com/downloads/connectors/hive/odbc/2-6-1.html

https://www.microsoft.com/en-us/download/details.aspx?id=40886
https://hortonworks.com/wp-content/uploads/2013/04/Hortonworks-Hive-ODBC-Driver-User-Guide.pdf

sasprofile · Posted 07-28-2021 03:16 PM

Thanks for the links

Since we don't we have SAS Access interface to Hadoop license, and we will have to try with SAS Access interface to ODBC.

What version of Hadoop ODBC driver I need to install from below links

So below would be the steps right i need to follow to make the connection from SAS to Hadoop and vice versa.

> Install Hadoop ODBC driver on Unix server on compute machine

> Configure the environment variables and are there any other files similar to tnsnames and odbci.ini files in oracle and sqlserver that i need to configure for Hadoop

> Once the above steps done test with the libname statement and do i need to install any Hadoop servers in sas management console for this testing

would appreciate if you could point me to link with the steps for this set up via odbc driver

SASKiwi · Posted 07-28-2021 05:14 PM

A LIBNAME statement does not require any SMC server setup. That's only required if you are defining any data connections in SAS metadata. Test first with LIBNAME and then implement any required ODBC data connections in SMC, to be shared with other users, including the ODBC server definition.

It would be good idea to raise a Tech Support track regarding the ODBC configuration so they can walk you through the required steps.

Sajid01 · Posted 07-28-2021 08:47 PM

Installation of Hadoop ODBC drivers

This is something any administrator (Hadoop / Linux and possibly SAS) should be able to do.

I have provided some links links in the earlier post.. Have a look at them.
Your Hadoop vendor should be able to provide odbc drivers and install configuration and documentation.

sasprofile · Posted 08-05-2021 11:05 PM

After performing below steps, everything went well..but I got stuck at the last step when connecting to the DB.

I encountered with the below error when testing the connectivity to verify whether am able to list the database and access the table data

Can you advice me where am going wrong in this

[S1000][unixODBC][Hortonworks][Hardy] (76) Error connecting to ZooKeeper. Please ensure the ZooKeeper host(s) and other connection information are configured correctly.
[ISQL]ERROR: Could not SQLConnect

>Installed and Configured the Hive Driver and unixodbc drivers with no issues

./configure --prefix=/opt/odbc/unixodbc --exec-prefix=/opt/odbc/unixodbc/oth --bindir=/opt/odbc/unixodbc/bin --sbindir=/opt/odbc/unixodbc/sbin --libexecdir=/opt/odbc/unixodbc/libexec --sysconfdir=/opt/odbc/unixodbc/etc --libdir=/opt/odbc/unixodbc/lib --includedir=/lib --includedir=/lib64
make
make install

>Ensure that the libraries installed (if UnixODBC was installed from source files) are included in LD_LIBRARY_PATH

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/lib:/lib64:/opt/odbc/unixodbc/lib

>Copied the files from hive installation directory into this location

# cp /usr/lib/hive/lib/native/hiveodbc/Setup/odbc.ini /opt/odbc/unixodbc/etc/odbc.ini
# cp /usr/lib/hive/lib/native/hiveodbc/Setup/odbcinst.ini /opt/odbc/unixodbc/etc/odbcinst.ini

>Configured the odbc.ini file

[hivedb]
Driver=/usr/lib/hive/lib/native/Linux-amd64-64/libhortonworkshiveodbc64.so
HOST=node1.hortonworks.com
PORT=10000
HiveServerType=2
AuthMech=1
UID=hive
KrbHostFQDN=kerb.hortonworks.com
KrbServiceName=hive
KrbRealm=HWX.COM

>Exported the variables to ensure that only our configured configuration files are being used

export ODBCINI=/opt/odbc/unixodbc/odbc.ini
export ODBCSYSINI=/opt/odbc/unixodbc
export HORTONWORKSHIVEINI=/usr/lib/hive/lib/native/Linux-amd64-64/hortonworks.hiveodbc.ini

>Tested the connectivity

Got the ticket from Hive and Kerberos using keytab file.

# kinit -k -t /tmp/hive.service.keytab hive/node1.hortonworks.com@HWX.COM
# klist

Once the ticket is granted, executed the following command to test connectivity

# /opt/odbc/unixodbc/bin/isql hivedb -v

but I encountered with the below error when testing the connectivity to verify whether am able to list the database and access the table data

[S1000][unixODBC][Hortonworks][Hardy] (76) Error connecting to ZooKeeper. Please ensure the ZooKeeper host(s) and other connection information are configured correctly.
[ISQL]ERROR: Could not SQLConnect

sasprofile · Posted 08-06-2021 01:37 PM

I was able to connect to hive with ./isql command from sas server

What would be the next step to accomplish sas to hadoop connectivity to pull data from hadoop to sas.

Can I run below libname statements from SAS EG to accomplish this ?

For Hadoop servers, we have multiple servers, we have high availability for Hive server2

so how does the multiple servers would fit in below libname statement for server = ?

libname hdp hadoop server=hxpduped port=10000 schema=default user=myusr1 password=mypwd1;

libname hdp hadoop server=hxpduped hive_principal='hive/_HOST@HD.COMPANY.COM';

# /opt/odbc/unixodbc/bin/isql hivedb 
+---------------------------------------+
| Connected!                            |
|                                       |
| sql-statement                         |
| help [tablename]                      |
| quit                                  |
|                                       |
+---------------------------------------+
SQL> show tables;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tab_name                                                                                                                                                                                                                                                       |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| csvinternal                                                                                                                                                                                                                                                    |
| cvsexternal                                                                                                                                                                                                                                                    |
| sample_07

Sajid01 · Posted 08-06-2021 03:20 PM

You should be

sasprofile · Posted 08-06-2021 03:30 PM

Sorry I didnt get you

No my question is

so how does the multiple servers would fit in below libname statement for server = ?

libname hdp hadoop server=hxpduped port=10000 schema=default user=myusr1 password=mypwd1;

libname hdp hadoop server=hxpduped hive_principal='hive/_HOST@HD.COMPANY.COM';

we have high availability for hive server2

HOST=<pcnkadpnt01.abc-abc.com:3020>,<pcnkadpnt02.abc-abc.com:3020>,<pcnkadpnt03.abc-abc.com:3020>