Hello everyone,
I am trying to read into SAS from a Hadoop Apache Hive table and facing the following error.
ERROR: Fetch error: java.lang.RuntimeException: Error retrieving next row: This may be due to very long strings in the actual data
and the use of DBSASTYPE to limit string length. Consider using "SAS_HADOOP_HIVE1815_WORKAROUND=YES" to force Hive fetchSize
to 1. Contact SAS Technical Support for details.
Has anyone faced a similar error? The script and table definition is given below
The script I am running is this:
proc sql;
connect to hadoop (
uri='url'
server='server_name'
schema=schema_name
DBMAX_TEXT= 256
properties='hive.vectorized.execution.enabled=false'
user=&username
pw=&password
);
create table lib_name.table_1 (compress=yes) as
SELECT * FROM CONNECTION TO hadoop (
select * from table_name);
disconnect from hadoop;
quit;
The datatype of all the columns in the hive table is this:
request_id | varchar(50) |
header_id | varchar(50) |
header_name | varchar(50) |
time_stamp | varchar(50) |
event_type_id | varchar(50) |
event_type_name | varchar(150) |
event_type_desc | varchar(150) |
business_name | varchar(150) |
first_name | varchar(100) |
last_name | varchar(100) |
language_pref | varchar(50) |
ban_id | varchar(50) |
ban_type | varchar(50) |
ban_sub_type | varchar(50) |
alternative_contact_number | varchar(50) |
product_billing_type | varchar(50) |
ported_tn | varchar(50) |
tn_status | varchar(50) |
port_type | varchar(50) |
port_status | varchar(50) |
line_number | varchar(50) |
desired_due_date_time | varchar(50) |
npdi | varchar(50) |
product_type | varchar(50) |
auto_act | varchar(50) |
old_network_sp | varchar(50) |
old_network_sp_name | varchar(150) |
new_network_sp | varchar(50) |
new_network_sp_name | varchar(150) |
npqty | varchar(50) |
target_brand | varchar(50) |
src_application_id | varchar(50) |
src_agent_id | varchar(50) |
month_num | varchar(6) |
I haven't seen this myself.
I wonder how to intgerpret the "in the actual data" part.
Can you confirm the datatypes by using a libref instead? (also, Iooking at your code, pass through is not required).
Hi @sivaranjani
Hopefully, you are using a supported version of Hadoop (Cloudera, AWS EMR, Hortonworks, etc.). For more information on supported Hadoop distributions check out:
An Insider's Guide to SAS and Database Support Information
The first thing to try is setting the SAS_HADOOP_HIVE1815_WORKAROUND=YES environment variable.
Have you solved this issue with proposed -set SAS_HADOOP_HIVE1815_WORKAROUND YES setting?
I have the same ERROR but when I set this variable it does not help me.
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.