- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
We are in the process of converting SAS programs, which source data from Netezza to access Hive/HDFS database. I am finding few data types used in SAS are not compatible with Hive data type.
Please help by providing direction how to proceed with such conversion and also let me know if any SAS document or note available to support the conversion.
Thanks in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So your data was in Netezza and is moved to HDFS via SAS and Hive ?
SAS only has 2 variable types: numeric and fixed-length string.
What Hive data types give you trouble? Dates?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
We moved Netezza data into HDFS and we are encountering issues when we bring data into SAS datsets as some of datatypes are not compatabile. One of the data type int. In hive
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Type INT should not be a problem.
You might lose precision with BIGINT for numbers > ~1e16, but INT should be fine.
What errors do you get?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am unable to open few data sets which has INT attribute.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What errors do you get?
Nothing in the SAS log?
Nothing in the HDFS or Hive logs?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks , I am able to resolve the issue by including
COLUMN_DELIMITER = '|'
which I found in one of SAS support communities data management page.
. Earlier I used to get error as per below log As per log issue is related to hive column which has more than 32 char length.
Logs:
NOTE: The SAS System stopped processing this step because of
errors.
WARNING: The data set X.VALUES1 may be incomplete. When this
step was stopped there were 0 observations and 13
variables.
HADOOP_5: Executed: on connection 3
CREATE TABLE `dev_hdw_sas_test`.`VALUES1` (`value_id`
BIGINT,`default_id` BIGINT,`stat_type_id`
BIGINT,`stattype_desc` VARCHAR(255),`value_date`
TIMESTAMP,`authorized` DECIMAL(14, 2),`outstanding` DECIMAL(14,
2),`issued` DECIMAL(14, 2),`source_system_ind`
7 The SAS System
06:08 Thursday, October 18, 2018
TINYINT,`create_date` TIMESTAMP,`update_date`
TIMESTAMP,`update_user_id` VARCHAR(255),`idp_data_date` DATE)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES
TERMINATED BY '\012' STORED AS TEXTFILE TBLPROPERTIES ('SAS OS
Name'='Linux','SAS Version'='9.04.01M4P11092016')
ERROR: ROLLBACK issued due to errors for data set
X.VALUES1.DATA.
NOTE: DATA statement used (Total process time):
real time 2.59 seconds
cpu time 0.01 seconds
8 The SAS System
06:08 Thursday, October 18, 2018
32 %LET _CLIENTTASKLABEL=;
33 %LET _CLIENTPROCESSFLOWNAME=;
34 %LET _CLIENTPROJECTPATH=;
35 %LET _CLIENTPROJECTPATHHOST=;
36 %LET _CLIENTPROJECTNAME=;
37 %LET _SASPROGRAMFILE=;
38 %LET _SASPROGRAMFILEHOST=;
39
40 ;*';*";*/;quit;run;
41 ODS _ALL_ CLOSE;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1.I am able to resolve the issue by including
So this is solved? If so mark your update as the solution.
2. I don't see where INT is the culprit for the error.
3. It is odd that you need set set | as the delimiter when the HDFS file is using \001