09-11-2017 11:15 AM
Impala stores dates as a datetime.
When you write a table to Impala through the Impala engine and that table has a date in it (eg a date9.) Impala stores that date as a datetime and when you then read that table it returns the value as a datetime28.9. So it has not only ignored the format it has completely changed the underlying value from a SAS date value to a SAS datetime value. This makes all exsiting date calculations fail to produce the expected results.
As you can imagine if you are moving data and code from a SAS environment to an Impala/SAS environment this 'feature' seems to require a huge amount of recoding to read all dates as a datepart of the datetime or to add the SASDATEFMT= option to every table every time you read it.
Does anyone happen to know a simpler solution to this issue because SAS aren't very forthcoming?
09-11-2017 05:23 PM - edited 09-11-2017 05:30 PM
The only column type to store dates in Impala appears to be TimeStamp. The SAS Access engine converts both SAS Date and SAS DateTime values to these TimeStamp columns. There is no other way and I'd consider this behaviour as correct.
Now the other way round:
Where would the SAS Access engine get the information from to decide whether it needs to convert an Impala TimeStamp to a SAS Date or SAS DateTime value? The SAS value which can hold Impala TimeStamp values is DateTime so conversion from Impala TimeStamp to SAS DateTime appears to be the correct conversion approach. Conversion to a SAS Date value could truncate the time portion of the source value.
If you need the SAS Access engine to convert an Impala TimeStamp value differently to SAS Date then you need to give the SAS Access engine this instruction. This is what SASDATEFMT allows you to do.
In short: There is nothing wrong with either SAS nor Impala. They just happen to store dates differently. Get used to it!
If you just regularly pull tables from Impala to SAS and you need columns converted to SAS Date values then consider to implement permanent SAS Views for these Impala tables which do the conversion for you and then use these SAS views like you would use any other SAS table.
And last but not least:
When moving to store data in any database instead of SAS tables then you need also to change your way of working a bit. You need to try and design your programs in a way that limits data movements between the database and SAS. Reduce the data as much as possible on the database side by writing code (i.e. joins) which execute in-database and only pull the data you need for further processing in SAS to the SAS side.
09-12-2017 04:13 AM
Thanks for your reply Patrick.
The SAS/Access engine to Hive reads the TABLEPROPERTIES element of the DDL to be able to discover the SAS format in which the data should be read/surfaced into SAS. So it is possible.
I agree this is normal DB behaviour but if SAS are looking for people to continue to use SAS and not just move wholesale into SQL on DB then they need to provide the fuctionality to enable this.
Our system consists of thousands of tables, hundreds of DI batch jobs and millions of lines of user code which interogates the data. I'm just trying to see if there is a simpler way around this than changing all these millions of lines of code and what you say is also the way I read it and is basically that there is no way round it.
09-12-2017 07:59 AM
I still believe you're asking for the impossible and that this is not a SAS issue but an issue for any Impala interfacing application which got DATE values as well as DATETIME/TIMESTAMP values.
Impala is not HIVE so imo you're comparing apples with oranges here.
As for your DI jobs: If these jobs have been implemented using DI Studio then your table metadata should actually contain sufficient information for SAS to decide if it's a Date or DateTime value and code should get generated accordingly from metadata.
I can't test it but if this is also not working for DIS generated code then yes, here you've got imo a case where you can ask for improvements.
If conversion doesn't happen based on formats: Is there an option for table metadata to apply SASDATEFMT= ? If so and you've still got your original SAS tables then that's may-be something you could add scripted (directly writing to SAS Metadata) and something SAS TechSupport could help you with. You just need to ask the right questions and the guys can be very helpful.
It sounds to me that you're in the process of migrating SAS tables into Hadoop and then want to access these tables via Impala. If so then you should still have your original SAS tables.
Not sure how you're migrating these tables to the new environment but I'd assume you're going for some scripted approach. If so then you could also create SAS Views for these Impala tables which do the conversion to the date values for you where applicable - and you could create these SAS Views scripted based on the formats applied to the original SAS tables you're migrating.
...and then store these views under libnames with librefs as used in your original code. This way all your code should still work as pre-data migration. Not everything will have an ideal design for the new data storage area (=too much data movement) but that's then something you can address as a post-migration task by taking on the most resource intensive/long running jobs first.
09-12-2017 09:08 AM
Thanks again for your reply Patrick. All good points.
I will soon be trying out the DI studio jobs to see how they generate with different table properties. I'll give the views a go too and see how that performs.
09-11-2017 06:09 PM
I suggest you look at using SAS macro to apply the appropriate options or functions automatically.
We do something similar ourselves with SQL Server. Its pretty easy to query a SAS table extracted from a database to identify any DATETIME variables and translate them all automatically. You can use the CONTENTS procedure or the SAS DICTIONARY tables to do this. Let me know if you would like some sample code.
09-12-2017 04:16 AM
This is probably going to be the solution we go for, at least short term. As I mentioned above to Patrick, we have millions of lines of code which need to be checked and changed if needed. Then tested, go through change management and implement. It's a massive job.
Thanks for your reply.
09-12-2017 04:24 PM
Our approach is to stage the database tables in SAS as "Level 1" without any translations. We then automatically detect any datetime columns and convert them and store them as "Level 2" tables. All user jobs then query the Level 2 tables. This avoids having all user jobs reading database tables and having to repeat column translations.