BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JayS
Obsidian | Level 7

I have been able to connect my SAS EG to an instance of HADOOP / Hive (just figuring out all this HADOOP stuff as I go)

 

However when I read in the data, ALL the character variables are 32,767 in length, and have formats?

I have a routine that checks the max length and see that at MOST a length is maybe 250.

 

I assume this is a HADOOP thing not a SAS Thing?

 

I have coded up a manual workaround to assign much more realistic lengths, but it's manual and not a solution.

 

My Questions

1) Did our 'HADOOP guys' just not take the time to assign lengths?

2) Is this standard, and everyone has to deal with it

2) Any nice way to handle it?

 

TIA, Jay

1 ACCEPTED SOLUTION

Accepted Solutions
LinusH
Tourmaline | Level 20

1) Probably yes. Hadoop "guys" in my experience doesn't care about string lenghts, perhaps enforcing some kind of "schema on read" feeling.

2) Don't know, up to the site to its standard I guess.

3) Define nice ;-).

I can tink of two ways:

  • Use the DBMAX_TEXT libname option. Even if that is kinda blunt, but it's better to have 500 rather than 32K columns.
  • ACCESS to Hadoop (Hive) honors the SASFMT attribute. If you in some way can generate a script that can generate the code based on any source schema this can quite easily be adopted by the Hadoop people. If not, this can be quite cumbersome, depending on the no of columns you wish to specify. This can be doneeither through SQL explicit pass-through from SAS (https://communities.sas.com/t5/SAS-Data-Management/SAS-Access-Hadoop-String-Columns-Field-Length-Opt...) or directly in Hive.
Data never sleeps

View solution in original post

5 REPLIES 5
ChrisNZ
Tourmaline | Level 20

How do you read the Hadoop data?

Hadoop storage is very basic in terms of metadata, but normally column length is known.

 

JayS
Obsidian | Level 7

Thanks Chris for the feedback..

To connect I use all the hadoop server=URI, Server, Schema string values....

 

Proc Data LIBNAME.MyDay;

        Set LIBNAME.HADOOP_Data;

Run;

 

For now I have common sensed the Length Statement after take a peek into the data...

 

Proc Data LIBNAME.MyDay;

        Length String5Code $5

        Length LongDesc     $150;

        Set LIBNAME.HADOOP_Data;

Run;

 

One thing I have noticed is the numeric are 8. which is correct...

 

Example of other values when read in..

Numeric Variable = Num 8 11. 11.

Date Variables = Num 8 DATETIME25.6 DATETIME25.6

 

Thanks, Jay

LinusH
Tourmaline | Level 20

1) Probably yes. Hadoop "guys" in my experience doesn't care about string lenghts, perhaps enforcing some kind of "schema on read" feeling.

2) Don't know, up to the site to its standard I guess.

3) Define nice ;-).

I can tink of two ways:

  • Use the DBMAX_TEXT libname option. Even if that is kinda blunt, but it's better to have 500 rather than 32K columns.
  • ACCESS to Hadoop (Hive) honors the SASFMT attribute. If you in some way can generate a script that can generate the code based on any source schema this can quite easily be adopted by the Hadoop people. If not, this can be quite cumbersome, depending on the no of columns you wish to specify. This can be doneeither through SQL explicit pass-through from SAS (https://communities.sas.com/t5/SAS-Data-Management/SAS-Access-Hadoop-String-Columns-Field-Length-Opt...) or directly in Hive.
Data never sleeps
JayS
Obsidian | Level 7

Thanks Linus,

 

I'll touch base with the HADOOP 'Guys'... Most likely this is the root cause...

The DBMAX_TEXT option, good idea I'll check it out.

500 is better than 32k for now, with the exception of a savings account!

 

Right now I have forced all values after creating a Proc Contents to look at the string values, See my other response on this thread

 

I'm also working on trying to figure out a way to run the MAX_Length code, take the output data load it into a SAS FMT and use the SAS Format Put() to replace the traditional Length Statement.

 

I'll keep trying for a few more hours while I contact the HADOOP guys 😉

 

Thanks, I'll post my final solution as it may help other...

 

J

 

JayS
Obsidian | Level 7

Update:

This code dbmax_text=175 Update worked fine for me.
For now I can live with this.

Thanks to everyone who provided input.

J

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 4487 views
  • 3 likes
  • 3 in conversation