Re: error in fetching data from Hive table

sivaranjani · Posted 11-12-2020 11:09 AM

Hello everyone,

I am trying to read into SAS from a Hadoop Apache Hive table and facing the following error.

ERROR: Fetch error: java.lang.RuntimeException: Error retrieving next row: This may be due to very long strings in the actual data
and the use of DBSASTYPE to limit string length. Consider using "SAS_HADOOP_HIVE1815_WORKAROUND=YES" to force Hive fetchSize
to 1. Contact SAS Technical Support for details.

Has anyone faced a similar error? The script and table definition is given below

The script I am running is this:

proc sql;
connect to hadoop (
uri='url'
server='server_name'
schema=schema_name
DBMAX_TEXT= 256
properties='hive.vectorized.execution.enabled=false'
user=&username
pw=&password
);

create table lib_name.table_1 (compress=yes) as
SELECT * FROM CONNECTION TO hadoop (
select * from table_name);
disconnect from hadoop;
quit;

The datatype of all the columns in the hive table is this:

request_id	varchar(50)
header_id	varchar(50)
header_name	varchar(50)
time_stamp	varchar(50)
event_type_id	varchar(50)
event_type_name	varchar(150)
event_type_desc	varchar(150)
business_name	varchar(150)
first_name	varchar(100)
last_name	varchar(100)
language_pref	varchar(50)
ban_id	varchar(50)
ban_type	varchar(50)
ban_sub_type	varchar(50)
alternative_contact_number	varchar(50)
product_billing_type	varchar(50)
ported_tn	varchar(50)
tn_status	varchar(50)
port_type	varchar(50)
port_status	varchar(50)
line_number	varchar(50)
desired_due_date_time	varchar(50)
npdi	varchar(50)
product_type	varchar(50)
auto_act	varchar(50)
old_network_sp	varchar(50)
old_network_sp_name	varchar(150)
new_network_sp	varchar(50)
new_network_sp_name	varchar(150)
npqty	varchar(50)
target_brand	varchar(50)
src_application_id	varchar(50)
src_agent_id	varchar(50)
month_num	varchar(6)

LinusH · Posted 11-13-2020 02:13 AM

I haven't seen this myself.

I wonder how to intgerpret the "in the actual data" part.

Can you confirm the datatypes by using a libref instead? (also, Iooking at your code, pass through is not required).

Data never sleeps

JBailey · Posted 11-13-2020 09:26 AM

Hi @sivaranjani

Hopefully, you are using a supported version of Hadoop (Cloudera, AWS EMR, Hortonworks, etc.). For more information on supported Hadoop distributions check out:

An Insider's Guide to SAS and Database Support Information

The first thing to try is setting the SAS_HADOOP_HIVE1815_WORKAROUND=YES environment variable.

You can set environment variables in the !SASROOT/bin/sasenv_local file, in the shell before you invoke SAS, or on the invocation by:
$> sas -set SAS_HADOOP_HIVE1815_WORKAROUND YES jobname.sas

Your SAS Administrator will likely need to do this, system wide.

Best wishes,

Jeff

sivaranjani · Posted 11-13-2020 08:50 PM

Thank you! I will reach out to the SAS admin about this

Grzegorz_klos1 · Posted 03-28-2022 10:16 AM

Have you solved this issue with proposed -set SAS_HADOOP_HIVE1815_WORKAROUND YES setting?

I have the same ERROR but when I set this variable it does not help me.

error in fetching data from Hive table