Hello everyone,
I am trying to read into SAS from a Hadoop Apache Hive table and facing the following error.
ERROR: Fetch error: java.lang.RuntimeException: Error retrieving next row: This may be due to very long strings in the actual data
and the use of DBSASTYPE to limit string length. Consider using "SAS_HADOOP_HIVE1815_WORKAROUND=YES" to force Hive fetchSize
to 1. Contact SAS Technical Support for details.
Has anyone faced a similar error? The script and table definition is given below
The script I am running is this:
proc sql;
connect to hadoop (
uri='url'
server='server_name'
schema=schema_name
DBMAX_TEXT= 256
properties='hive.vectorized.execution.enabled=false'
user=&username
pw=&password
);
create table lib_name.table_1 (compress=yes) as
SELECT * FROM CONNECTION TO hadoop (
select * from table_name);
disconnect from hadoop;
quit;
The datatype of all the columns in the hive table is this:
| request_id | varchar(50) |
| header_id | varchar(50) |
| header_name | varchar(50) |
| time_stamp | varchar(50) |
| event_type_id | varchar(50) |
| event_type_name | varchar(150) |
| event_type_desc | varchar(150) |
| business_name | varchar(150) |
| first_name | varchar(100) |
| last_name | varchar(100) |
| language_pref | varchar(50) |
| ban_id | varchar(50) |
| ban_type | varchar(50) |
| ban_sub_type | varchar(50) |
| alternative_contact_number | varchar(50) |
| product_billing_type | varchar(50) |
| ported_tn | varchar(50) |
| tn_status | varchar(50) |
| port_type | varchar(50) |
| port_status | varchar(50) |
| line_number | varchar(50) |
| desired_due_date_time | varchar(50) |
| npdi | varchar(50) |
| product_type | varchar(50) |
| auto_act | varchar(50) |
| old_network_sp | varchar(50) |
| old_network_sp_name | varchar(150) |
| new_network_sp | varchar(50) |
| new_network_sp_name | varchar(150) |
| npqty | varchar(50) |
| target_brand | varchar(50) |
| src_application_id | varchar(50) |
| src_agent_id | varchar(50) |
| month_num | varchar(6) |
I haven't seen this myself.
I wonder how to intgerpret the "in the actual data" part.
Can you confirm the datatypes by using a libref instead? (also, Iooking at your code, pass through is not required).
Hi @sivaranjani
Hopefully, you are using a supported version of Hadoop (Cloudera, AWS EMR, Hortonworks, etc.). For more information on supported Hadoop distributions check out:
An Insider's Guide to SAS and Database Support Information
The first thing to try is setting the SAS_HADOOP_HIVE1815_WORKAROUND=YES environment variable.
Have you solved this issue with proposed -set SAS_HADOOP_HIVE1815_WORKAROUND YES setting?
I have the same ERROR but when I set this variable it does not help me.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.