We are in the process of converting SAS programs, which source data from Netezza to access Hive/HDFS database. I am finding few data types used in SAS are not compatible with Hive data type.
Please help by providing direction how to proceed with such conversion and also let me know if any SAS document or note available to support the conversion.
Thanks in advance.
So your data was in Netezza and is moved to HDFS via SAS and Hive ?
SAS only has 2 variable types: numeric and fixed-length string.
What Hive data types give you trouble? Dates?
Type INT should not be a problem.
You might lose precision with BIGINT for numbers > ~1e16, but INT should be fine.
What errors do you get?
I am unable to open few data sets which has INT attribute.
What errors do you get?
Nothing in the SAS log?
Nothing in the HDFS or Hive logs?
Thanks , I am able to resolve the issue by including
COLUMN_DELIMITER = '|'
which I found in one of SAS support communities data management page.
. Earlier I used to get error as per below log As per log issue is related to hive column which has more than 32 char length.
Logs:
NOTE: The SAS System stopped processing this step because of
errors.
WARNING: The data set X.VALUES1 may be incomplete. When this
step was stopped there were 0 observations and 13
variables.
HADOOP_5: Executed: on connection 3
CREATE TABLE `dev_hdw_sas_test`.`VALUES1` (`value_id`
BIGINT,`default_id` BIGINT,`stat_type_id`
BIGINT,`stattype_desc` VARCHAR(255),`value_date`
TIMESTAMP,`authorized` DECIMAL(14, 2),`outstanding` DECIMAL(14,
2),`issued` DECIMAL(14, 2),`source_system_ind`
7 The SAS System
06:08 Thursday, October 18, 2018
TINYINT,`create_date` TIMESTAMP,`update_date`
TIMESTAMP,`update_user_id` VARCHAR(255),`idp_data_date` DATE)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES
TERMINATED BY '\012' STORED AS TEXTFILE TBLPROPERTIES ('SAS OS
Name'='Linux','SAS Version'='9.04.01M4P11092016')
ERROR: ROLLBACK issued due to errors for data set
X.VALUES1.DATA.
NOTE: DATA statement used (Total process time):
real time 2.59 seconds
cpu time 0.01 seconds
8 The SAS System
06:08 Thursday, October 18, 2018
32 %LET _CLIENTTASKLABEL=;
33 %LET _CLIENTPROCESSFLOWNAME=;
34 %LET _CLIENTPROJECTPATH=;
35 %LET _CLIENTPROJECTPATHHOST=;
36 %LET _CLIENTPROJECTNAME=;
37 %LET _SASPROGRAMFILE=;
38 %LET _SASPROGRAMFILEHOST=;
39
40 ;*';*";*/;quit;run;
41 ODS _ALL_ CLOSE;
1.I am able to resolve the issue by including
So this is solved? If so mark your update as the solution.
2. I don't see where INT is the culprit for the error.
3. It is odd that you need set set | as the delimiter when the HDFS file is using \001
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.