BookmarkSubscribeRSS Feed
vorkady
Obsidian | Level 7

We are in the process of converting SAS programs, which source data from Netezza to access Hive/HDFS database. I am finding few  data types used in SAS  are not compatible with Hive data type.    

Please help by providing direction how to proceed with such conversion and also let me know if any SAS document or note  available   to support the conversion.

 

Thanks in advance.

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

So your data was in Netezza and is moved to HDFS via SAS and Hive ?

SAS only has 2 variable types: numeric and fixed-length string.

What Hive data types give you trouble? Dates?

 

vorkady
Obsidian | Level 7
‎Hi
We moved Netezza data into HDFS and we are encountering issues when we bring data into SAS datsets as some of datatypes are not compatabile. One of the data type int. In hive
ChrisNZ
Tourmaline | Level 20

Type INT should not be a problem.

You might lose precision with BIGINT for numbers > ~1e16, but INT should be fine.

What errors do you get?

vorkady
Obsidian | Level 7

I am unable to open few data sets which has INT attribute. 

 

ChrisNZ
Tourmaline | Level 20

What errors do you get?

Nothing in the SAS log?

Nothing in the HDFS or Hive logs?

vorkady
Obsidian | Level 7

Thanks , I am able to resolve the issue by including

COLUMN_DELIMITER = '|'

 

which I found in one of SAS support communities data management page.

. Earlier I used to get error as per below log As per log issue is related to hive column which has more than 32 char length. 

Logs:

 

NOTE: The SAS System stopped processing this step because of

errors.

WARNING: The data set X.VALUES1 may be incomplete. When this

step was stopped there were 0 observations and 13

variables.

 

HADOOP_5: Executed: on connection 3

CREATE TABLE `dev_hdw_sas_test`.`VALUES1` (`value_id`

BIGINT,`default_id` BIGINT,`stat_type_id`

BIGINT,`stattype_desc` VARCHAR(255),`value_date`

TIMESTAMP,`authorized` DECIMAL(14, 2),`outstanding` DECIMAL(14,

2),`issued` DECIMAL(14, 2),`source_system_ind`

7 The SAS System

06:08 Thursday, October 18, 2018

TINYINT,`create_date` TIMESTAMP,`update_date`

TIMESTAMP,`update_user_id` VARCHAR(255),`idp_data_date` DATE)

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\001' LINES

TERMINATED BY '\012' STORED AS TEXTFILE TBLPROPERTIES ('SAS OS

Name'='Linux','SAS Version'='9.04.01M4P11092016')

 

ERROR: ROLLBACK issued due to errors for data set

X.VALUES1.DATA.

NOTE: DATA statement used (Total process time):

real time 2.59 seconds

cpu time 0.01 seconds

 

8 The SAS System

06:08 Thursday, October 18, 2018

 

32 %LET _CLIENTTASKLABEL=;

33 %LET _CLIENTPROCESSFLOWNAME=;

34 %LET _CLIENTPROJECTPATH=;

35 %LET _CLIENTPROJECTPATHHOST=;

36 %LET _CLIENTPROJECTNAME=;

37 %LET _SASPROGRAMFILE=;

38 %LET _SASPROGRAMFILEHOST=;

39

40 ;*';*";*/;quit;run;

41 ODS _ALL_ CLOSE;

ChrisNZ
Tourmaline | Level 20

1.I am able to resolve the issue by including

So this is solved? If so mark your update as the solution.

 

2. I don't see where INT is the culprit for the error.

 

3. It is odd that you need set set   |   as the delimiter when the HDFS file is using \001

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2170 views
  • 0 likes
  • 2 in conversation