- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
I am trying to read into SAS from a Hadoop Apache Hive table and facing the following error.
ERROR: Fetch error: java.lang.RuntimeException: Error retrieving next row: This may be due to very long strings in the actual data
and the use of DBSASTYPE to limit string length. Consider using "SAS_HADOOP_HIVE1815_WORKAROUND=YES" to force Hive fetchSize
to 1. Contact SAS Technical Support for details.
Has anyone faced a similar error? The script and table definition is given below
The script I am running is this:
proc sql;
connect to hadoop (
uri='url'
server='server_name'
schema=schema_name
DBMAX_TEXT= 256
properties='hive.vectorized.execution.enabled=false'
user=&username
pw=&password
);
create table lib_name.table_1 (compress=yes) as
SELECT * FROM CONNECTION TO hadoop (
select * from table_name);
disconnect from hadoop;
quit;
The datatype of all the columns in the hive table is this:
request_id | varchar(50) |
header_id | varchar(50) |
header_name | varchar(50) |
time_stamp | varchar(50) |
event_type_id | varchar(50) |
event_type_name | varchar(150) |
event_type_desc | varchar(150) |
business_name | varchar(150) |
first_name | varchar(100) |
last_name | varchar(100) |
language_pref | varchar(50) |
ban_id | varchar(50) |
ban_type | varchar(50) |
ban_sub_type | varchar(50) |
alternative_contact_number | varchar(50) |
product_billing_type | varchar(50) |
ported_tn | varchar(50) |
tn_status | varchar(50) |
port_type | varchar(50) |
port_status | varchar(50) |
line_number | varchar(50) |
desired_due_date_time | varchar(50) |
npdi | varchar(50) |
product_type | varchar(50) |
auto_act | varchar(50) |
old_network_sp | varchar(50) |
old_network_sp_name | varchar(150) |
new_network_sp | varchar(50) |
new_network_sp_name | varchar(150) |
npqty | varchar(50) |
target_brand | varchar(50) |
src_application_id | varchar(50) |
src_agent_id | varchar(50) |
month_num | varchar(6) |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I haven't seen this myself.
I wonder how to intgerpret the "in the actual data" part.
Can you confirm the datatypes by using a libref instead? (also, Iooking at your code, pass through is not required).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi @sivaranjani
Hopefully, you are using a supported version of Hadoop (Cloudera, AWS EMR, Hortonworks, etc.). For more information on supported Hadoop distributions check out:
An Insider's Guide to SAS and Database Support Information
The first thing to try is setting the SAS_HADOOP_HIVE1815_WORKAROUND=YES environment variable.
$> sas -set SAS_HADOOP_HIVE1815_WORKAROUND YES jobname.sas
Your SAS Administrator will likely need to do this, system wide.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have you solved this issue with proposed -set SAS_HADOOP_HIVE1815_WORKAROUND YES setting?
I have the same ERROR but when I set this variable it does not help me.