BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
moorsd
Obsidian | Level 7
Is it possible to register SPDE tables stored on HDFS in metadata, using SAS Management Console or code? I can't seem to locate any documentation that suggests this can be done. If it can be done, could someone advise how this can be achieved. Thanks David
1 ACCEPTED SOLUTION

Accepted Solutions
SteveSober
SAS Employee

Hi moorsd,

 

I’m glad this was helpful. You are correct you must be at SAS 9.4 M3 (maintenance 3) to leverage the Hive Serde for SPD Engine data. 

 

No I’m not doing a workshop  but my colleague Jeff Bailey is, and I see you’ve already chimed in with some feedback on what you’d like to see in the workshop. I am presenting a paper and it would be great if it works for you to attend it or to stop by to say hello:

 

Session ID: SAS2180
Session Title: How to Leverage the Hadoop Distributed File System as a Storage Format for the SAS® Scalable Performance Data Server and SAS® Scalable Performance Data Engine
Session Type: Breakout
Day Scheduled: Tuesday, 4/19/16
Start Time/End Time: 15:00/ 15:20
Location: Bellini 2005

View solution in original post

11 REPLIES 11
JuanS_OCS
Amethyst | Level 16

 

I know it is possible to register the tables once registered/loaded into LASR, but this is not what you are asking for.

Good question, I don't know and I feel interested on how to work with this option too.

LinusH
Tourmaline | Level 20

I've checked in my environment, and there are no signs of any hdfs adoption in the new library wizard for SPDE. And the same goes for SPD Server. Reservation: this is in 9.4 M2. But SMC usually doesn't get updated much between major versions.

So my guess is that you need to go the not so preferred way of pre-asigning the library (by external configuration).

Data never sleeps
SteveSober
SAS Employee

Timely question David. I am working on my next article which will be posted to the Data Management community. In this article I  which will provided details on how to register SPD Engine data to the Hive metastore. In the meantime here is some high level information on this topic.

 

Accessing SPD Engine Data Using Hive; SAS provides a custom Hive SerDe for SPD Engine data that is stored in HDFS. The SerDe makes the data available for applications outside of SAS to query using HiveQL.

 

Requirements:

  1. You must deploy SAS Foundation using the SAS Deployment Wizard. Select SAS Hive SerDe for SPDE Data.
  2. You must be running a supported Hadoop distribution that includes Hive 0.13 (for example, Cloudera CDH 5.2, Hortonworks HDP 2.1 or later, or MapR 4.0.2 or later).
  3. The data stored in HDFS must have been created using the SPD Engine.
  4. The SerDe is delivered as two JAR files that must be deployed to all nodes in the Hadoop cluster.

 

The data set must be registered in the Hive metastore. SAS supplies a registration utility to the data set. You cannot use any other method to register data sets.

 

For more information about the Hive SerDe for the SPD Engine, see SAS 9.4 SPD Engine: Storing Data in the Hadoop Distributed File System at  http://support.sas.com/documentation/cdl/en/engspdehdfsug/67948/HTML/default/viewer.htm#n02qaxsa4wu1...

moorsd
Obsidian | Level 7

Hi Steve,

 

Many thanks for the information. It was very informative. But from what I've read around this topic you need to be running on SAS 9.4 M3 to have the access the SAS Hive SerDe for SPDE.

 

The client I'm working at has SAS 9.4 M2 and Hive SerDe is not available for that release. So I guess we're stuck until they upgrade to SAS 9.4 M3 or some other version of SAS that may be released.

 

Are you running a SAS & Hadoop workshop at this years Global Forum. If so, I'll pop along and say hello.

 

thanks again,

 

David 

 

 

SteveSober
SAS Employee

Hi moorsd,

 

I’m glad this was helpful. You are correct you must be at SAS 9.4 M3 (maintenance 3) to leverage the Hive Serde for SPD Engine data. 

 

No I’m not doing a workshop  but my colleague Jeff Bailey is, and I see you’ve already chimed in with some feedback on what you’d like to see in the workshop. I am presenting a paper and it would be great if it works for you to attend it or to stop by to say hello:

 

Session ID: SAS2180
Session Title: How to Leverage the Hadoop Distributed File System as a Storage Format for the SAS® Scalable Performance Data Server and SAS® Scalable Performance Data Engine
Session Type: Breakout
Day Scheduled: Tuesday, 4/19/16
Start Time/End Time: 15:00/ 15:20
Location: Bellini 2005

LinusH
Tourmaline | Level 20

Exiting times, it sure is.

@SteveSober: are you saying that as of today (M3) you can't register a "plain" SPDE hdfs library in SMC?

The work-around would be to configure it as a Hive data store, and the register that in SMC as a Hive library?

Perhaps you will bring this up in your presentation...

 

Perhaps a bit off-topic (but not much Smiley Wink): as an architect it's quite difficult to compare different scenarios with new engine possibilities. Especially since it seems be not so many real world implementations learn from. Where can I find information that compare SPDE/SPDS hdfs, SPDE/SPDS Hive and pure Hive implementations? What are the pros and cons? Lessons learned etc.

 

br

Data never sleeps
SteveSober
SAS Employee

Thanks for your comments and questions LinusH. To summarize what I am saying:

  1. To utilize the Hive Serde for SPD Engine Data you must be at SAS 9.4 M3.
  2. You cannot register the SPD Engine data to the Hive metastore using the SAS Management Console. This is done using a command that is issued on the Hadoop head node; more details to follow in my next article.
  3. You can register an SPD Engine LIBREF to hdfs using SAS Management Console. Use this libref when you create, update, append or modify the SPD Engine data. Use this libref when you use PROC DS2 via SAS In-Database Code Accelerator for Hadoop and PROC TRANSPOSE. Note: SQL processing of SPD Engine data store on HDFS is done on the SAS Server.
  4. You can register a Hive LIBREF using SAS Management Console. Use this libref whenever you want to process SQL against the SPD Engine data registered to the Hive metastore. I also suggest using this libref when you use PROC FREQ, MEANS, REPORT, and SUMMARY as well as when using DS2 MATCH MERGE and or FedSQL on the SET statement of your DS2 thread that runs code via MapReduce. The reason for this is MATH MERGE and FedSQL push SQL to run via MapReduce.

 

In the ideal world the information you requested (a bit off-topic) would be readily available but the reality is these kinds of implementation comparisons are proprietary among our customer base. In addition it is very hard to give general guidelines because every environment is unique. One site may have 100 cores in their Hadoop cluster while others sites have over 1,000 cores. One site might utilize TEZ or SPARK as the method HiveQL is processed. Another site may not use TEZ or SPARK. As you can see the results of any benchmark would vary tremendously depending on the core count of your Hadoop cluster and the methodology used to processing HiveQL.

moorsd
Obsidian | Level 7

Hi Steve,

 

As the SPDE SerDe is just two Jar files, can these be deployed manually to all the nodes in the Hadoop cluster, so this works with SAS 9.4 M2 Then in a Hive query just add something like:

 

exec (ADD JAR /home/hadoop/SerDes/name_of_SAS_Serde.jar).

 

Is it that you just need the SAS 9.4 M3 depot to get the SerDe from SAS, as we already have a SAS 9.4 M3 depot. Or is there something in the SAS 9.4 M3 SAS/ACCESS to Hadoop engine that is doing something different under the hood? Could you tell me more about the registration utility that SAS supplies.

 

The ability to see SPDE files in Hive is something that would be ideal for my UK client, however it's taken nearly 12 months to migrate 10 SAS environments to SAS 9.4 M2 so the appetite to upgrade to SAS 9.4 M3 isn't there at the minute..

 

Thanks

 

David

LinusH
Tourmaline | Level 20
Going to M3 is not a migration, it's an update. So if all the Hadoop stuff is important for your customer upgrading is recommended. Tweaking the installation with not support component versions isn't really a best practice.
Data never sleeps
SteveSober
SAS Employee

Thanks @moorsd and @LinusH for your questions and comments. If you were to tweak the SAS 9.4 M2 environment to try and leverage the HiveSerDe for SPD Engine Data SAS Technical Support would not be able to help you with any issues. It is best to wait until you are at SAS 9.4 M3 before trying to leverage the SerDe.

moorsd
Obsidian | Level 7

@LinusH

 

Thanks for the information.

 

Sadly as the customer has just spent a lot of time and money migrating from 9.2 to 9.4 M2 (which was the current distribution when the migration project started). The appitite to upgrade to M3 isn't there at the moment. I was just wondering we we could get the SPDE Hove SerDe working with 9.4 M2 as a stop gap. Yes, it would be a hack but Hive can use other SerDe's so why not the SAS 9.4 M3 one?

 

A SAS upgrade is being considered by the client for 2017. However with a new version of SAS due soon (50 years of SAS, so time for a big announcement?) it might not be worth the bother going to M3, and see what a new version of SAS may bring to the table, Hadoop wise. Spark integration (i.e. SAS/ACCESS to Spark) would be nice. DataLoader 2.4 has started the ball rolling with Spark so maybe something like that may happen?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 3667 views
  • 5 likes
  • 4 in conversation