BookmarkSubscribeRSS Feed

Cloud Object Storage and SAS

Started ‎09-29-2020 by
Modified ‎09-29-2020 by
Views 4,053

Cloud object storage (S3, ADLS, GCS, etc.) keeps getting more and more relevant for our customers. They keep putting more and more data into it. Consequently, they want more and more SAS (native/direct please!) connectivity to it.

 

Let's dive into cloud object storage. First let's help understand object storage, look at its advantages, and then finally see how SAS can utilize and/or integrate with it.

Understanding Cloud Object Storage

.

As a "Modern FTP Server"

.

Avoiding formal definitions and lots of technical details, the easiest way to understand cloud object storage is to think of it as a "modern FTP server." Object storage isn't actually an FTP server (although you can set that up) but functionally it plays the same role as FTP servers did in the past, and often still do today. You want to allow outside data to come into your network? You set up an FTP server and allow people to push or pull data into it. Object storage plays this same role for cloud platforms, although via different mechanisms. It allows outside data to come into your cloud network.

 

Say you wanted to bring data into your cloud database instance. You could configure a local database client to connect with your cloud db, open a bunch of ports (sounds risky!), and pass the data up that way. However, it's a whole lot easier just to push the data to a bucket in your cloud provider's object storage and access it from there.

 

CloudFTPComparison-1024x475.png

Comparing Object Storage to a Traditional FTP Server

Select the image to see a larger version.
Mobile users: To view the image, select the "Full" version at the bottom of the page.

. .

REST-based put, get, mdir, mkdir, etc. (API)

Also similarly to FTP, (at least for S3) we even use PUT and GET commands to load and unload object storage data. Regardless of exact command syntax, object storage platforms generally utilize simple, open REST APIs for data maintenance. This is what makes it "modern." Generally the entire object storage implementation is REST (HTTP) based, right down to each object (table, file, etc.) having its own URL.

But it's Permanent

Functionally, this is where the similarity to those traditional FTP servers ends. Because, unlike those old FTP servers which were generally meant for staging only, object storage is being used more and more to store data permanently. It's not as close to the processing engines as other storage options, like local disk or network attached storage. So, generally, performance against it isn't as good but object storage has several other advantages discussed later.

Object Storage for Data Lakes

So what are our customers storing out there and why are they choosing object storage? Well, remember the old Hadoop Data Lake value statement? It went something like

 

"Put your old data into Hadoop. It's cheap and Hadoop's big data capabilities will help you process it (if and) when you need it."

This is the basic idea of a data lake and more organizations are using object storage for this purpose. Unlike a data mart or a data warehouse that is organized to answer questions, a data lake is simply a collection of raw tables, files, and whatever else you might want to keep for future analyses. You don't spend time modeling the data for ease of use but let data scientists (a.k.a. programmers) figure it out when the need arises.

 

Now, it looks like many organizations are replacing Hadoop with object storage for data lakes. Why? Well, as the linked article states, there are several reasons but four really stand out:

  1. Cost: Object Storage is seriously cheap, with different providers claiming 5+ times cost savings over local HDFS, and, with tiered pricing it can get even cheaper.
  2. Elasticity: Want to double your data? Just put it out there. There's no need to requisition and configure hardware. There's no need to add HDFS nodes.
  3. Integration: More and more applications are integrating directly with object storage. Want to use Spark to process your data? There's no need to put it in HDFS, just hook up your cloud Hadoop instance to the provider's object storage. Want to hook up SAS Viya to S3? No problem.
  4. Minimal Maintenance: All of the above comes at the push of a few buttons or some super simple scripting. There's no hardware or software to manage. There's no Hadoop expertise required.

SAS and Object Storage

If object storage is good for data lakes then SAS is great for object storage. After all SAS has been making sense of raw data for over 40 years.

 

Recently, SAS has been adding connectivity and integration with object storage to both its SAS and CAS data analytics platforms. Below is collateral that highlights SAS' current object storage capabilities along with some clever ways people have found to connect SAS to object storage.

AWS Simple Storage Service (S3)

Azure Data Lake Storage (ADLS)

Google Cloud Storage (GCS)

Version history
Last update:
‎09-29-2020 12:21 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags