Cloud object storage (S3, ADLS, GCS, etc.) keeps getting more and more relevant for our customers. They keep putting more and more data into it. Consequently, they want more and more SAS (native/direct please!) connectivity to it.
Let's dive into cloud object storage. First let's help understand object storage, look at its advantages, and then finally see how SAS can utilize and/or integrate with it.
Avoiding formal definitions and lots of technical details, the easiest way to understand cloud object storage is to think of it as a "modern FTP server." Object storage isn't actually an FTP server (although you can set that up) but functionally it plays the same role as FTP servers did in the past, and often still do today. You want to allow outside data to come into your network? You set up an FTP server and allow people to push or pull data into it. Object storage plays this same role for cloud platforms, although via different mechanisms. It allows outside data to come into your cloud network.
Say you wanted to bring data into your cloud database instance. You could configure a local database client to connect with your cloud db, open a bunch of ports (sounds risky!), and pass the data up that way. However, it's a whole lot easier just to push the data to a bucket in your cloud provider's object storage and access it from there.
Comparing Object Storage to a Traditional FTP Server
Select the image to see a larger version.
Mobile users: To view the image, select the "Full" version at the bottom of the page.
Also similarly to FTP, (at least for S3) we even use PUT and GET commands to load and unload object storage data. Regardless of exact command syntax, object storage platforms generally utilize simple, open REST APIs for data maintenance. This is what makes it "modern." Generally the entire object storage implementation is REST (HTTP) based, right down to each object (table, file, etc.) having its own URL.
Functionally, this is where the similarity to those traditional FTP servers ends. Because, unlike those old FTP servers which were generally meant for staging only, object storage is being used more and more to store data permanently. It's not as close to the processing engines as other storage options, like local disk or network attached storage. So, generally, performance against it isn't as good but object storage has several other advantages discussed later.
So what are our customers storing out there and why are they choosing object storage? Well, remember the old Hadoop Data Lake value statement? It went something like
"Put your old data into Hadoop. It's cheap and Hadoop's big data capabilities will help you process it (if and) when you need it."
This is the basic idea of a data lake and more organizations are using object storage for this purpose. Unlike a data mart or a data warehouse that is organized to answer questions, a data lake is simply a collection of raw tables, files, and whatever else you might want to keep for future analyses. You don't spend time modeling the data for ease of use but let data scientists (a.k.a. programmers) figure it out when the need arises.
Now, it looks like many organizations are replacing Hadoop with object storage for data lakes. Why? Well, as the linked article states, there are several reasons but four really stand out:
If object storage is good for data lakes then SAS is great for object storage. After all SAS has been making sense of raw data for over 40 years.
Recently, SAS has been adding connectivity and integration with object storage to both its SAS and CAS data analytics platforms. Below is collateral that highlights SAS' current object storage capabilities along with some clever ways people have found to connect SAS to object storage.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.