Hi,
I'm trying to use SAS Studio for data preparation within hadoop. I'm using the libname statement or pass-through querries as 'proc sql'. From my understanding SAS generates hiveql code that does the work within hadoop and simply delivers the answer back to SAS Studio.
For 'smaller' tables it works just fine. For 'bigger' datasets I need to do the job directly in HIVE or use a work around. e.g.
libname myhive hadoop subprotocol=hive2 port=10000 host="myhost" schema=default user=&u_name. pw=&u_pass.;
data test;
set myhive.bigtable; *the table has around 4m rows and 40 columns, which isn't that big really;
run;
For this statement I get the following error:
Hi,
we found the solution.
I got 2 slightly different Usernames for HUE and SasStudio. In the libname statement to connect to "myhive" I was using the SAS Studio User and PW. This is of course wrong. You need to set it to the User and PW for HUE. Strangely, however, the connection to hive still works if I used a "wrong" User and PW combination for smaller datasets.
Sounds like from the eror message that you have a problem on the Hive side.
Use
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;
to fins out more what SAS/ACCESS transforms to Hive, and how Hive reacts (try first with a small query so you get the error free version).
I take it that you use a commercial SAS license with no data volume restrictions?
Thanks for the quick answer.
I added your options statement and let this statement run again:
libname myhive hadoop subprotocol=hive2 port=10000 host="myhost" schema=myhive user=&u_name. pw=&u_pass.;
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;
data test; set myhive.bigtable; run;
Now I get the following messages:
HADOOP_1: Executed: on connection 3
USE `myhive`
HADOOP_2: Prepared: on connection 3
SHOW TABLES 'BIGTABLE'
HADOOP_3: Prepared: on connection 3
DESCRIBE FORMATTED BIGTABLE
HADOOP_4: Prepared: on connection 3
SELECT * FROM `BIGTABLE`
ERROR: Prepare error: Error while processing statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
SQL statement: SELECT * FROM `BIGTABLE'
What I was trying to say was have a successful execution first (obs=1000000), and see if you can identify any HiveQL issues.
Otherwise, I would try open a track to SAS tech support.
Looks like no HiveQL issues to me.
I guess I will open a track then.
Thanks for your help!
Hi,
we found the solution.
I got 2 slightly different Usernames for HUE and SasStudio. In the libname statement to connect to "myhive" I was using the SAS Studio User and PW. This is of course wrong. You need to set it to the User and PW for HUE. Strangely, however, the connection to hive still works if I used a "wrong" User and PW combination for smaller datasets.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.