01-07-2016 10:55 AM
I am creating a hands-on workshop for SAS Global Forum. The topic is SAS/ACCESS to Hadoop.
Here is the abstract...
SAS3880 - An Insider’s Guide to SAS/ACCESS to Hadoop
In the very near future Hadoop will likely find its way into your SAS environment. It is rapidly displacing database management systems in the corporate world and rearing its head in the SAS world. If you think now is the time to learn how to use SAS with Hadoop, you are in luck. This workshop is the jump start you need.
This workshop show you how to access Hadoop using SAS. We devote most of our time to exploring SAS/ACCESS to Hadoop. During the workshop you will learn:
A basic understanding of Hadoop is a prerequisite for attending this workshop.
Here are the details:
Session ID: SAS3880
Session Title: An Insider's Guide to SAS/ACCESS® Interface to Hadoop
Session Type: Hands-On Workshop
Day Scheduled: Tuesday, 4/19/16
Start Time/End Time: 11:00/ 12:40
Location: Veronese 2401-2502
I am planning on making this exercise heavy - ideally the majority of the time will be spent on exercises. I am planning on focusing on SAS/ACCESS and using the FILENAME statement and Hadoop procedure in supporting roles.
Here is where you can guide the development of the workshop. I would love to hear your thoughts on the following questions (feel free to answer in the comments).
What topics do you think should be included?
What do you think should be emphasized?
When you first started with SAS + Hadoop what caused you the most trouble/aggrevation?
What do you wish you had known when you first started using SAS + Hadoop?
What do you want to know now?
Thanks for helping me with this. If you are attending SGF please look me up. If you have experience with this stuff I may put you to work as a lab assistant.
01-07-2016 12:26 PM
01-07-2016 02:26 PM - edited 03-31-2017 01:09 PM
I have seen the training course you mentioned. I plan to focus exclusively on Hadoop'ism. It sounds like my plan is on-track with what you would prefer. Thanks for the insight; I greatly appreciate it.
02-12-2016 04:36 AM
I'd like to see something related to best practice when using Hive tables. i.e what type of file formats work best with the SAS/ACCESS engine for different types of data. i.e parquet, ORCFile. Also, look at compression of data and how this, if at all, impact latency when transferring data back from the hadoop cluster onto the SAS servers, clients. How you create different file formats using the SAS/ACCESS engine, etc. Also, and hints, tips and optimisations that can be made. Do SAS have any guidelines best practise about what works best with different data types? There was data modelling for hadoop and hive paper (attached) written by SAS in 2013 that recommended using sequence files for improved performance. That was in 2013, things have moved on a lot in the hadoop space in 3 years. Are sequence files still SAS' recommendation? An updated version of the results of that paper would be most welcome, but maybe outside of the scope of what can be achieved in the workshop. I'm looking forward to this workshop greatly, and I'm sure it will be a success. Thanks, David
02-12-2016 04:39 AM
Apologies for the big block of text in my previous post, I'm having problems using IE11 (enforced by the business I'm working at) with these forum posts using Rich Text format, so had to use HTML