I have been able to connect my SAS EG to an instance of HADOOP / Hive (just figuring out all this HADOOP stuff as I go)
However when I read in the data, ALL the character variables are 32,767 in length, and have formats?
I have a routine that checks the max length and see that at MOST a length is maybe 250.
I assume this is a HADOOP thing not a SAS Thing?
I have coded up a manual workaround to assign much more realistic lengths, but it's manual and not a solution.
My Questions
1) Did our 'HADOOP guys' just not take the time to assign lengths?
2) Is this standard, and everyone has to deal with it
2) Any nice way to handle it?
TIA, Jay
1) Probably yes. Hadoop "guys" in my experience doesn't care about string lenghts, perhaps enforcing some kind of "schema on read" feeling.
2) Don't know, up to the site to its standard I guess.
3) Define nice ;-).
I can tink of two ways:
How do you read the Hadoop data?
Hadoop storage is very basic in terms of metadata, but normally column length is known.
Thanks Chris for the feedback..
To connect I use all the hadoop server=URI, Server, Schema string values....
Proc Data LIBNAME.MyDay;
Set LIBNAME.HADOOP_Data;
Run;
For now I have common sensed the Length Statement after take a peek into the data...
Proc Data LIBNAME.MyDay;
Length String5Code $5
Length LongDesc $150;
Set LIBNAME.HADOOP_Data;
Run;
One thing I have noticed is the numeric are 8. which is correct...
Example of other values when read in..
Numeric Variable = Num 8 11. 11.
Date Variables = Num 8 DATETIME25.6 DATETIME25.6
Thanks, Jay
1) Probably yes. Hadoop "guys" in my experience doesn't care about string lenghts, perhaps enforcing some kind of "schema on read" feeling.
2) Don't know, up to the site to its standard I guess.
3) Define nice ;-).
I can tink of two ways:
Thanks Linus,
I'll touch base with the HADOOP 'Guys'... Most likely this is the root cause...
The DBMAX_TEXT option, good idea I'll check it out.
500 is better than 32k for now, with the exception of a savings account!
Right now I have forced all values after creating a Proc Contents to look at the string values, See my other response on this thread
I'm also working on trying to figure out a way to run the MAX_Length code, take the output data load it into a SAS FMT and use the SAS Format Put() to replace the traditional Length Statement.
I'll keep trying for a few more hours while I contact the HADOOP guys 😉
Thanks, I'll post my final solution as it may help other...
J
Update:
This code dbmax_text=175 Update worked fine for me.
For now I can live with this.
Thanks to everyone who provided input.
J
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.