Administration and Deployment

AhmedAl_Attar · Posted 03-08-2024 07:06 AM

Hi All,

I came across this Working with Mountpoint for Amazon S3, which states

" Mountpoint for Amazon S3 is a high-throughput open source file client for mounting an Amazon S3 bucket as a local file system. With Mountpoint, your applications can access objects stored in Amazon S3 through file system operations, such as open and read. Mountpoint automatically translates these operations into S3 object API calls, giving your applications access to the elastic storage and throughput of Amazon S3 through a file interface.

Mountpoint for Amazon S3 is generally available for production use on your large-scale read-heavy applications: data lakes, machine learning training, image rendering, autonomous vehicle simulation, extract, transform, and load (ETL), and more."

I was wondering if anyone has had the chance to evaluate/work-with this as a mechanism to access SAS data sets residing in S3 Bucket(s) ?

Thanks,

Ahmed

JuanS_OCS · Posted 03-08-2024 07:32 AM

Hi @AhmedAl_Attar ,

I do not have experience with this particular one, but I do have experience with similar ones.

In my case, I connected Azure Datalake Gen2 storage (ADLS G2) to SAS Viya using standard file system methods to allow users to store and read/write data, code, outputs, logs etc. This also work with SAS 9.4, no both Linux and Windows.

ADLS, same as S3, are buckets/blobs of data, and the main access is trough APIs. That is the general best practice.

However, there are some methods for users who don't want to bother with API calls, which are all but developers basically. Some create ODBC or driver bridges that transform the File System standard operations into API calls, and provides views.

Nothing of that is officially supported by SAS, since they are not vendor supported, but supported by normally 1 individual developer, or, if you are lucky, a sizable team; therefore it is a call done by each company.

What you need to always do is to read the small letters, to find out the limitations and to find out as well if those limitations are agreeable (user requirements & risk management) by your stakeholders. You might even need to involve Security or the Enterprise Architect in some cases.

In the case of the one you present, I can see the capabilities described here. I see it has some tricky sentences. For example: it mentions that permissions are not supported, deletions, renames... and it speaks of great sequential throughput performance reads, but it mentions nothing about writes, which is something I would like to learn and query more about.

https://github.com/awslabs/mountpoint-s3/blob/main/doc/SEMANTICS.md

In short, I would like to encourage you to continue this investigation, don't stop only on the marketing statements, test it by yourself, making a risk analysis, and to find out if this one fits your purposes! Noting, of course, that while the SAS Tech Support folks are always willing to help, the responsibility of official support would not fall on their laps and you would need to find proper support for the maintenance of this component.

I hope it helps somehow!

Best regards,

Juan

AhmedAl_Attar · Posted 03-08-2024 07:58 AM

@JuanS_OCS

Thank you for your insightful input.

My main goal/use-case for this Mountpoint, is to set it up as Read-Only "file-system" that avoids duplicating the large data sets we have on S3 into EBS, in order to process it with SAS 9.4! Basically working around the limitations of Proc S3 and minimize the "expensive" charges of EBS when compared to S3 charges!

Your statement

"I see it has some tricky sentences. For example: it mentions that permissions are not supported, deletions, renames... and it speaks of great sequential throughput performance reads, but it mentions nothing about writes, which is something I would like to learn and query more about."

Very valid and worth paying attention to for sure, but in our use case, single/concurrent writes are not applicable/desired functionality. We can always fall back to Proc S3 and use it's upload feature for storing outputs into S3 bucket(s).

Our testing will continue, but I wanted to see if someone else has been through this already and we can benefit from any tips and avoid the gotchas they encountered 😉

I'll try to circle back and share our experience/findings once we finish our Prof of Concept/test use-case.

Thanks,

Ahmed

Administration and Deployment

Does SAS 9.4 work with Mountpoint for Amazon S3?

Re: Does SAS 9.4 work with Mountpoint for Amazon S3?

Re: Does SAS 9.4 work with Mountpoint for Amazon S3?

ESTIMATE STATMENT NOT WORKING ON SAS 9.4 NLIN PROCEDURE

The sum does not work?

Creating a table with Proc Report Sas 9.4

Generated temporary data files in SAS 9.4 does not show up in "Works"

Working with Masked Data in SAS Visual Investigator REST APIs (Part 3)

Follow Us

What is...