Hi moorsd,
I’m glad this was helpful. You are correct you must be at SAS 9.4 M3 (maintenance 3) to leverage the Hive Serde for SPD Engine data.
No I’m not doing a workshop but my colleague Jeff Bailey is, and I see you’ve already chimed in with some feedback on what you’d like to see in the workshop. I am presenting a paper and it would be great if it works for you to attend it or to stop by to say hello:
Session ID: SAS2180
Session Title: How to Leverage the Hadoop Distributed File System as a Storage Format for the SAS® Scalable Performance Data Server and SAS® Scalable Performance Data Engine
Session Type: Breakout
Day Scheduled: Tuesday, 4/19/16
Start Time/End Time: 15:00/ 15:20
Location: Bellini 2005
I know it is possible to register the tables once registered/loaded into LASR, but this is not what you are asking for.
Good question, I don't know and I feel interested on how to work with this option too.
I've checked in my environment, and there are no signs of any hdfs adoption in the new library wizard for SPDE. And the same goes for SPD Server. Reservation: this is in 9.4 M2. But SMC usually doesn't get updated much between major versions.
So my guess is that you need to go the not so preferred way of pre-asigning the library (by external configuration).
Timely question David. I am working on my next article which will be posted to the Data Management community. In this article I which will provided details on how to register SPD Engine data to the Hive metastore. In the meantime here is some high level information on this topic.
Accessing SPD Engine Data Using Hive; SAS provides a custom Hive SerDe for SPD Engine data that is stored in HDFS. The SerDe makes the data available for applications outside of SAS to query using HiveQL.
Requirements:
The data set must be registered in the Hive metastore. SAS supplies a registration utility to the data set. You cannot use any other method to register data sets.
For more information about the Hive SerDe for the SPD Engine, see SAS 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System at http://support.sas.com/documentation/cdl/en/engspdehdfsug/67948/HTML/default/viewer.htm#n02qaxsa4wu1...
Hi Steve,
Many thanks for the information. It was very informative. But from what I've read around this topic you need to be running on SAS 9.4 M3 to have the access the SAS Hive SerDe for SPDE.
The client I'm working at has SAS 9.4 M2 and Hive SerDe is not available for that release. So I guess we're stuck until they upgrade to SAS 9.4 M3 or some other version of SAS that may be released.
Are you running a SAS & Hadoop workshop at this years Global Forum. If so, I'll pop along and say hello.
thanks again,
David
Hi moorsd,
I’m glad this was helpful. You are correct you must be at SAS 9.4 M3 (maintenance 3) to leverage the Hive Serde for SPD Engine data.
No I’m not doing a workshop but my colleague Jeff Bailey is, and I see you’ve already chimed in with some feedback on what you’d like to see in the workshop. I am presenting a paper and it would be great if it works for you to attend it or to stop by to say hello:
Session ID: SAS2180
Session Title: How to Leverage the Hadoop Distributed File System as a Storage Format for the SAS® Scalable Performance Data Server and SAS® Scalable Performance Data Engine
Session Type: Breakout
Day Scheduled: Tuesday, 4/19/16
Start Time/End Time: 15:00/ 15:20
Location: Bellini 2005
Exiting times, it sure is.
@SteveSober: are you saying that as of today (M3) you can't register a "plain" SPDE hdfs library in SMC?
The work-around would be to configure it as a Hive data store, and the register that in SMC as a Hive library?
Perhaps you will bring this up in your presentation...
Perhaps a bit off-topic (but not much ): as an architect it's quite difficult to compare different scenarios with new engine possibilities. Especially since it seems be not so many real world implementations learn from. Where can I find information that compare SPDE/SPDS hdfs, SPDE/SPDS Hive and pure Hive implementations? What are the pros and cons? Lessons learned etc.
br
Thanks for your comments and questions LinusH. To summarize what I am saying:
In the ideal world the information you requested (a bit off-topic) would be readily available but the reality is these kinds of implementation comparisons are proprietary among our customer base. In addition it is very hard to give general guidelines because every environment is unique. One site may have 100 cores in their Hadoop cluster while others sites have over 1,000 cores. One site might utilize TEZ or SPARK as the method HiveQL is processed. Another site may not use TEZ or SPARK. As you can see the results of any benchmark would vary tremendously depending on the core count of your Hadoop cluster and the methodology used to processing HiveQL.
Hi Steve,
As the SPDE SerDe is just two Jar files, can these be deployed manually to all the nodes in the Hadoop cluster, so this works with SAS 9.4 M2 Then in a Hive query just add something like:
exec (ADD JAR /home/hadoop/SerDes/name_of_SAS_Serde.jar).
Is it that you just need the SAS 9.4 M3 depot to get the SerDe from SAS, as we already have a SAS 9.4 M3 depot. Or is there something in the SAS 9.4 M3 SAS/ACCESS to Hadoop engine that is doing something different under the hood? Could you tell me more about the registration utility that SAS supplies.
The ability to see SPDE files in Hive is something that would be ideal for my UK client, however it's taken nearly 12 months to migrate 10 SAS environments to SAS 9.4 M2 so the appetite to upgrade to SAS 9.4 M3 isn't there at the minute..
Thanks
David
Thanks @moorsd and @LinusH for your questions and comments. If you were to tweak the SAS 9.4 M2 environment to try and leverage the HiveSerDe for SPD Engine Data SAS Technical Support would not be able to help you with any issues. It is best to wait until you are at SAS 9.4 M3 before trying to leverage the SerDe.
Thanks for the information.
Sadly as the customer has just spent a lot of time and money migrating from 9.2 to 9.4 M2 (which was the current distribution when the migration project started). The appitite to upgrade to M3 isn't there at the moment. I was just wondering we we could get the SPDE Hove SerDe working with 9.4 M2 as a stop gap. Yes, it would be a hack but Hive can use other SerDe's so why not the SAS 9.4 M3 one?
A SAS upgrade is being considered by the client for 2017. However with a new version of SAS due soon (50 years of SAS, so time for a big announcement?) it might not be worth the bother going to M3, and see what a new version of SAS may bring to the table, Hadoop wise. Spark integration (i.e. SAS/ACCESS to Spark) would be nice. DataLoader 2.4 has started the ball rolling with Spark so maybe something like that may happen?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.