I'm running SAS 9.4 M6 on a 64 bit Windows Server 2016 machine. I have both Enterprise Guide (8.1) and Display Manager available to me, but I typically use EG.
I'm connecting to a Hadoop database. I can create a Libname, or I can access the database via PROC SQL and explicit pass through. Both methods are using ODBC.
ISSUE: I can't get a list of Hadoop tables into my SAS session.
If I know the table name in advance, I can successfully run a query, but it's darned inconvenient not to be able to get a list of tables.
In past shops, I've been able to see a list of the Hadoop tables after assigning a Libname in the SAS EG "Servers" panel. In this case, I can see the Libname, but when I click on the drop down, there are no tables. Display Manager behaves similarly. See SAS code for Libname below.
I tried running a Proc SQL against DICTIONARY.TABLES, but I got no rows in return.
I also tried using PROC DATASETS. In my log, I get:
29 PROC DATASETS 30 LIBRARY=HIVEDB 31 ; ODBC: AUTOCOMMIT is YES for connection 1 ODBC: Called SQLTables with schema of NULL WARNING: No matching members in directory. 32 RUN;
I can do a SHOW TABLES via an explicit pass through, but nothing gets returned to the SAS log.
Question 1: I notice in my log from my Proc Datasets the message, "Called SQLTables with schema of NULL". Is there a way to tell Proc Datasets that I want a particular schema? I did specify a schema on my Libname, but apparently it's not coded correctly or isn't getting passed to Proc Datasets. Here's my Libname:
libname hiveDB odbc noprompt="uid=jbarbour; pwd=XXXXXXXXX; dsn=OPSI_HIVE_STG1;
host=dbms0502; port=10000;schema=edps; authmech=3";
Question 2: Is there a way I can redirect the output of SHOW TABLES to the SAS log? I have the following options in effect:
options SASTRACE=",,,ds" sastraceloc=saslog nostsuffix;
options source source2 mprint fullstimer notes fmterr;
options msglevel=i;
Thank you in advance,
Jim
Hello @jimbarbour
If your site has licence for SAS Access to Hadoop, try this instead of ODBC Connection :
proc sql; connect to hadoop (READ_METHOD=HDFS /* this attemps HDFS read which is faster than JDBC */ server="<Serve-Name>" /* the hiveserver we talk to */ LOGIN_TIMEOUT=300 schema=&hdp_schema. /* this is the schema where you want to read or write data to */ uri="<Server_uri>");
create table schema_hdp as select * from connection to hadoop (show tables); disconnect from hadoop; quit;
@Reeza ,
My query looks like:
PROC SQL;
CONNECT TO ODBC AS Hive_DB
(NOPROMPT = "uid=jbarbour; pwd=XXXXXXXX; dsn=OPSI_HIVE_STG1;
host=dbms0502; port=10000;schema=edps; authmech=3")
;
EXECUTE(Show Tables) BY Hive_DB;
DISCONNECT FROM Hive_DB;
QUIT;
My log looks like:
2 The SAS System 15:44 Thursday, March 5, 2020 29 PROC SQL; 30 CONNECT TO ODBC AS Hive_DB 31 (NOPROMPT = XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 32 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX) 33 ; ODBC: AUTOCOMMIT is YES for connection 2 34 EXECUTE(Show Tables) BY Hive_DB; ODBC_1: Executed: on connection 2 Show Tables Summary Statistics for ODBC are: Total SQL execution seconds were: 0.158019 Total seconds used by the ODBC ACCESS engine were 0.159900 35 DISCONNECT FROM Hive_DB; 36 QUIT; NOTE: PROCEDURE SQL used (Total process time): real time 0.61 seconds user cpu time 0.01 seconds system cpu time 0.03 seconds memory 365.28k OS Memory 20828.00k Timestamp 03/05/2020 03:44:38 PM Step Count 35 Switch Count 0
Jim
@Reeza ,
Could "metastore" be "metastor" or something? I'm getting an 8 character message.
29 data test2; 30 set METASTORE.TBLS; ERROR: Libref 'METASTORE' exceeds 8 characters. ERROR: File WORK.TBLS.DATA does not exist. 31 RUN;
Jim
Hello @jimbarbour
If your site has licence for SAS Access to Hadoop, try this instead of ODBC Connection :
proc sql; connect to hadoop (READ_METHOD=HDFS /* this attemps HDFS read which is faster than JDBC */ server="<Serve-Name>" /* the hiveserver we talk to */ LOGIN_TIMEOUT=300 schema=&hdp_schema. /* this is the schema where you want to read or write data to */ uri="<Server_uri>");
create table schema_hdp as select * from connection to hadoop (show tables); disconnect from hadoop; quit;
We do have SAS/ACCESS for ODBC. Let me ask and see if I can get a URI.
---Base SAS Software ---SAS/STAT ---SAS/CONNECT ---SAS/Secure 168-bit ---SAS/Secure Windows ---SAS/ACCESS Interface to PC Files ---SAS/ACCESS Interface to ODBC ---SAS Workspace Server for Local Access ---High Performance Suite
Jim
Schema is usually called metastore?
libname hiveDB odbc noprompt="uid=jbarbour; pwd=XXXXXXXXX; dsn=OPSI_HIVE_STG1;
host=dbms0502; port=10000;schema=metastore; authmech=3";
data want;
set hivedb.tbls;
run;
Or something like this, not sure if the schema may affect things here:
PROC SQL;
CONNECT TO ODBC AS Hive_DB
(NOPROMPT = "uid=jbarbour; pwd=XXXXXXXX; dsn=OPSI_HIVE_STG1;
host=dbms0502; port=10000;schema=edps; authmech=3")
create table demo as select * from connection to HIVE_DB (select * from metastore.tbls)
DISCONNECT FROM Hive_DB;
QUIT;
These are just guesses based on how I know Hadoop functions, can't test unfortunately. Good Luck!
@Reeza ,
Thank you for following up.
I couldn't get it to work. It errors out. I ran the same code as I had before except that I changed the schema to metastore.
The error messages isn't particularly instructive, but I assume that it just can't find a schema named metastore.
ERROR: CLI error trying to establish connection: Unable to retrieve error message.
Jim
@Reeza ,
Well, the explicit pass through did in fact work, just not the schema "metastore". I'm thinking that, while "metastore" may be common as a schema, it is unfortunately not universal.
Jim
OK! That did the trick! Thank you so much!!
The key was "SELECT * FROM CONNECTION TO Hive_DB(Show Tables)". I had coded EXECUTE(Show Tables) BY Hive_DB which ran fine -- but returned nothing to SAS.
Here's my working code:
PROC SQL;
CONNECT TO ODBC AS Hive_DB
(NOPROMPT = "uid=jbarbour; pwd=XXXXXXXX; dsn=OPSI_HIVE_STG1;
host=dbms0502; port=10000;schema=charttarget; authmech=3")
;
CREATE TABLE WORK.Hadoop_Table_List AS
SELECT * FROM CONNECTION TO Hive_DB(SHOW TABLES);
DISCONNECT FROM Hive_DB;
QUIT;
I thank you again,
Jim
Actually, there's an even better way to do this: List the schema twice on a Libname. As counter-intuitive and kludgey as that sounds, it works. When done in this fashion, Proc Datasets works, SQL against DICTIONARY tables works, and, more importantly, the tables are listed in Display Manager or Enterprise Guide just as though they were SAS tables.
For example:
LIBNAME HiveDB ODBC SCHEMA = ChartTarget
NOPROMPT = "UID=jbarbour; PWD=&lPwd; HOST=DBMS0502;
DSN=OPSI_HIVE_STG1; SCHEMA=ChartTarget;
PORT=10000; AUTHMECH=3"
;
Notice in the above example that I specify the schema twice, once outside the NOPROMPT and once inside. The schema inside the NOPROMPT appears to be used for the actual connection to the databse. The schema outside the NOPROMPT appears to be used by SAS to populate Proc Datasets, the SAS GUI's (DMS and EG), and the DICTIONARY tables.
The direct connection with a "Select * from connection to" syntax is still a very useful construct, so the original answer from @r_behata still stands as the solution.
Jim
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.