In a previous article, I wrote about the current possibilities available to access files stored in Google’s object storage implementation: Google Cloud Storage (GCS).
Let’s deep dive and see how we can access files in GCS using Cloud Storage FUSE.
“Cloud Storage FUSE is an open source FUSE adapter that allows you to mount Cloud Storage buckets as file systems on Linux or macOS systems.”
Essentially, Cloud Storage FUSE provides a command-line utility, named “gcsfuse”, which helps you mount a GCS bucket to a local directory so that the bucket’s contents are visible and accessible locally like any other file.
Access to the bucket is totally transparent. Any new file in the bucket will be immediately visible in the mount point directory. Any new file in the mount point directory will be immediately visible in the bucket.
From a SAS and CAS perspective, it’s nothing else than accessing OS directories.
There’s a warning though (see the “Caution” section in the documentation):
“Cloud Storage FUSE is a Google-developed and community-supported open-source tool, written in Go and hosted on GitHub. It is distributed as-is, without warranties of any kind.”
So, it’s probably good for experimentation, performance testing, some migration use cases and for getting acquainted with Google Cloud Storage. It might not a good fit for a real Viya 3.5 production environment. It certainly has some limitations that I will mention at the end.
SAS Viya 4 will bring support for a Google Cloud Storage CASLIB (similar to what we already have with AWS S3 and Azure Data Lake Storage Gen2). So, gcsfuse provides an opportunity to step into Google Cloud Storage world and see what benefits it could bring.
In terms of pricing, this utility is free of charge. However, any data operation involved on the mount point contents (which is ultimately GCS) will be charged accordingly.
gcsfuse provides several options to mount your GCS bucket to a local directory. Here is an example:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Basically, I will see locally in /opt/gcs/mount all the files available in my gcpdm-test GCS bucket. For authentication, this utility relies on a service account credentials file that can be obtained easily with a gcloud command (“gcloud iam service-accounts keys create”). If you want to share the mount point to other users, the user who uses gcsfuse must add the allow_other option.
Well, it depends on what you want to achieve. You have multiple options:
It looks like any traditional SAS code accessing local or network paths. Nothing specific related to GCS.
From a SAS standpoint, there’s a few limitations in using gcsfuse, especially around writing (saving) data from CAS to GCS:
When CAS creates a Parquet file, it actually creates a directory, first temporarily and then renames it with the target file name. Because of the nature of an object storage, a directory renaming is not supported in GCS, causing the save operation to fail (even if the Parquet partitions are created successfully).
Concurrent updates of the same file from different machines are not handled correctly by gcsfuse (no concurrency control for multiple writers to a file). Therefore, saving data as SASHDAT, CSV formats in parallel using a DNFS CASLIB is not an option.
Thanks for reading.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.