I am trying to create a hive table from SAS using the below code.
Have you guys come across any data compatibility / data type conversion issues?
Till now I have faced the below issues, can someone please help.
1. There is a NOTE saying SAS variable labels, formats, and lengths are not written to DBMS tables. This means the labels do not get stored.
2. Numeric column values that are long, get saved as exponential values in Hive.
LIBNAME orchive HADOOP
uri='jdbc:hive2://xxxxxxx.systems.uk.xxxx:1111,xxxxxx.systems.uk.xxxx:1111,xxxxxxxx.systems.uk.xxxx:1111/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;principal=hive/_HOST@HRES.ADROOT.XXXX?hive.execution.engine=tez;hive.tez.container.size=16384;tez.queue.name=risk'
PORT=10001 
SERVER="xxxxxxxxx.systems.uk.xxxx,xxxxxxxxx.systems.uk.xxxx,xxxxxxxx.systems.uk.xxxx"
schema=prodschema
dbmax_text=100
DBCREATE_TABLE_OPTS='STORED AS ORC'
; 
data orchive.test;
set test;
run;
> There is a NOTE saying SAS variable labels, formats, and lengths are not written to DBMS tables. This means the labels do not get stored.
That's right. There's no metadata in Hive. You can add comments though.
> Numeric column values that are long, get saved as exponential values in Hive.
That's weird. What's the data type? or do you have a short SAS format maybe?
Also, for speed, use proc append rather than a data step.
SAS - The account numbers in SAS are 15 digit long. As it is numeric, the length of course is 8. Format is 20. Informat is 20.
Hive - (See image ). the datatype is double. But when queried through hive prompt using select statement, it shows as exponential value. See image.
I do not open MS office documents from the web. Use the image icon.
If the data type is double, I suspect the issue is simply that SAS uses the default best. format to display the value, and that format cannot show 15 digits.
Oh the hive prompt shows the exponential notation. Mmm could it be setting on your hive viewer?
Here is the Hive prompt showing the value as exponential.
Please advise if this can be fixed.
How does it display the number when queried form sas with a 20. format?
This does not look like a SAS question. Your viewer may be the culprit, see here for example https://stackoverflow.com/questions/32576187/hive-converting-from-double-to-string-not-in-scientific
Thanks Chris. I tried casting it as big int and it displays correctly. So I think the better way to do it would be to create an empty hive table with the structure you need, and then use PROC APPEND.
Why would you store account number as anything other than a string? What is meaning of a MEAN account number?
If the values are already up to 15 digits long they will pretty soon exceed the limit of precision for storing as a floating point number.
2333  data _null_;
2334    x=constant('exactint');
2335    put x= comma32.;
2336  run;
x=9,007,199,254,740,992
That means that beyond 9,007,199,254,740,992 you can no longer be certain the integer can be stored precisely. Some numbers that are powers of 2 can be represented exactly but not the integers between them.
2. How did you verify that it's stored as exponential? Try to look at the values from a Hive query, not SAS to rula out any data type conversion.
Your coulumn should be stored as DOUBLE given the default data mapping rules.
If you have a display in conversion to sas, you can play with SAS properties on the Hive table:
ALTER TABLE passthrough_ex SET TBLPROPERTIES ('SASFMT:bgint'='CHAR(20)')If you don't thik DOUBLE itself works, you can use the DBSASTYPE option to use another data type in Hive.
Thanks Linush
> Numeric column values that are long, get saved as exponential values in Hive.
That's just the way the are displayed. Use a long format to see all the digits.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
