Write and run SAS programs in your web browser

Error accessing 'big' tables using SAS Studio

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Error accessing 'big' tables using SAS Studio

Hi,

 

I'm trying to use SAS Studio for data preparation within hadoop. I'm using the libname statement or pass-through querries as 'proc sql'. From my understanding SAS generates hiveql code that does the work within hadoop and simply delivers the answer back to SAS Studio.

 

For 'smaller' tables it works just fine. For 'bigger' datasets I need to do the job directly in HIVE or use a work around. e.g.

 

libname myhive hadoop subprotocol=hive2 port=10000 host="myhost" schema=default user=&u_name. pw=&u_pass.;

data test;

    set myhive.bigtable;  *the table has around 4m rows and 40 columns, which isn't that big really;

run;

 

 

For this statement I get the following error:

ERROR: Prepare error: Error while processing statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
SQL statement: SELECT * FROM `bigtable`
 
I guess its something to do with the size of the table. If I use an obs=1000000 statement and read-in the table in 4 datasets á 1m each it works.
 
I get the same error if I use any proc-step or proc-sql step on that data table. If I e.g. want to get some simple proc freq counts I get the same error.
 
Any help, ideas, work arounds would be highly appreciated!
 

 

 

 

 


Accepted Solutions
Solution
‎06-06-2016 03:09 AM
Occasional Contributor
Posts: 9

Re: Error accessing 'big' tables using SAS Studio

[ Edited ]
Posted in reply to JanVanHeyden

Hi,

 

we found the solution.

 

I got 2 slightly different Usernames for HUE and SasStudio. In the libname statement to connect to "myhive" I was using the SAS Studio User and PW. This is of course wrong. You need to set it to the User and PW for HUE. Strangely, however, the connection to hive still works if I used a "wrong" User and PW combination for smaller datasets.

 

View solution in original post


All Replies
Super User
Posts: 5,437

Re: Error accessing 'big' tables using SAS Studio

Posted in reply to JanVanHeyden

Sounds like from the eror message that you have a problem on the Hive side.

Use

options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;

to fins out more what SAS/ACCESS transforms to Hive, and how Hive reacts (try first with a small query so you get the error free version).

I take it that you use a commercial SAS license with no data volume restrictions?

Data never sleeps
Occasional Contributor
Posts: 9

Re: Error accessing 'big' tables using SAS Studio

[ Edited ]

Thanks for the quick answer.

 

I added your options statement and let this statement run again:

 

libname myhive hadoop subprotocol=hive2 port=10000 host="myhost" schema=myhive user=&u_name. pw=&u_pass.;
options msglevel=i sastrace=',,,d' sastraceloc=saslog nostsuffix;

data test; set myhive.bigtable; run;

 

Now I get the following messages:

 

HADOOP_1: Executed: on connection 3
USE `myhive`
 
HADOOP_2: Prepared: on connection 3
SHOW TABLES 'BIGTABLE'
 
HADOOP_3: Prepared: on connection 3
DESCRIBE FORMATTED BIGTABLE
 
HADOOP_4: Prepared: on connection 3
SELECT * FROM `BIGTABLE`
 
ERROR: Prepare error: Error while processing statement: FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
SQL statement: SELECT * FROM `BIGTABLE'
Super User
Posts: 5,437

Re: Error accessing 'big' tables using SAS Studio

Posted in reply to JanVanHeyden

What I was trying to say was have a successful execution first (obs=1000000), and see if you can identify any HiveQL issues.

Otherwise, I would try open a track to SAS tech support.

Data never sleeps
Occasional Contributor
Posts: 9

Re: Error accessing 'big' tables using SAS Studio

Looks like no HiveQL issues to me.

 

I guess I will open a track then.

 

Thanks for your help!

Solution
‎06-06-2016 03:09 AM
Occasional Contributor
Posts: 9

Re: Error accessing 'big' tables using SAS Studio

[ Edited ]
Posted in reply to JanVanHeyden

Hi,

 

we found the solution.

 

I got 2 slightly different Usernames for HUE and SasStudio. In the libname statement to connect to "myhive" I was using the SAS Studio User and PW. This is of course wrong. You need to set it to the User and PW for HUE. Strangely, however, the connection to hive still works if I used a "wrong" User and PW combination for smaller datasets.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 574 views
  • 1 like
  • 2 in conversation