Accessing files on Google Cloud Storage (GCS) using REST
- Article History
- RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
In a previous article, I wrote about the current possibilities available to access files stored in Google’s object storage implementation: Google Cloud Storage (GCS).
Let's dive deeper and see how we can access files in GCS using REST from SAS.
The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage: “Cloud Storage JSON API v1”. We will use this API to access GCS files from SAS.
Prerequisite: get an authorization access token
In order to use the “Cloud Storage JSON API v1” through REST, we need an authorization access token. Google provides a useful playground online to generate such access tokens. The access token usually lasts 3600 seconds (1 hour).
Cloud Storage API Endpoint
In the playground, you might have noticed that the endpoint to access a file looks like the following:
https://storage.googleapis.com/storage/v1/b/{bucket}/o/{object}
where:
- {bucket} is the GCS bucket where the file is to be found
- {object} is the name of the file, correctly encoded (the “/” must be encoded)
NB: Objects do not reside within subdirectories in a bucket, but rather reside in a flat namespace. So, the file “data/contact_list.csv” makes it appear as a file named “contact_list.csv” in a folder “data” but in reality, its actual file name is “data/contact_list.csv”.
To download the file, we will modify the endpoint as shown below (notice the %2F to encode the “/” character and the addition of “?alt=media” to trigger the download of the file):
https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fcontact_list.csv?alt=media
Download a CSV file from GCS using SAS PROC HTTP
Now that we have an authorization token and a correct endpoint, we can proceed from SAS to download a file from GCS:
/* Store token */
%let GCSTOKEN=ya29.a0AfH6SMCxmjHFa.....SlqVw2_JEggJIhM ;
/* Temporary fileref */
filename outcsv temp ;
/* Download the file using the REST API */
proc http
url="https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fcontact_list.csv?alt=media"
oauth_bearer="&GCSTOKEN"
out=outcsv ;
debug level=1 ;
run ;
We can check the log for a “HTTP/1.1 200 OK” message.
Also, we can display the file in the log to check the contents:
/* Dump the output file in the log */
data _null_ ;
infile outcsv ;
input ;
put _infile_ ;
run ;
We can then use the file however we want, load it in CAS for instance:
/* Load the file in CAS */
cas mysession ;
proc casutil outcaslib="public" ;
load file=outcsv importoptions=(filetype="csv") casout="contact_list" replace ;
quit ;
Download a SAS Data Set from GCS using SAS PROC HTTP
Likewise, we can download a SAS data set:
/* Get the WORK library path */
%let myworkpath=%sysfunc(pathname(work)) ;
/* The file will be accessible in the WORK library */
filename outsas "&myworkpath/hmeq.sas7bdat" ;
/* Download the SAS data set using the REST API */
proc http
url="https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fhmeq.sas7bdat?alt=media"
oauth_bearer="&GCSTOKEN"
out=outsas ;
debug level=1 ;
run ;
/* Load the SAS data set in CAS */
cas mysession ;
proc casutil outcaslib="public" ;
load data=work.hmeq casout="hmeq" replace ;
quit ;
Upload files to GCS from SAS using SAS PROC HTTP
Interacting with GCS using the REST API is not one-way only. We can upload files as well. The endpoint for uploading files is as follows:
NB: Encoding the “/” in the file name is not needed in this particular case (parameter).
Example:
/* Upload a CSV file */
/* Temporary fileref */
filename csvin temp ;
/* Create a CSV file from a SAS data set */
proc export data=sashelp.class
outfile=csvin
dbms=csv
replace ;
run ;
/* Upload it to GCS */
proc http
url="https://storage.googleapis.com/upload/storage/v1/b/demo-gcpdm/o?uploadType=media%nrstr(&)name=data/class_rest.csv"
oauth_bearer="&GCSTOKEN"
in=csvin ;
headers "Content-type"="text/csv" ;
debug level=1 ;
run ;
/* Upload a SAS data set */
/* Create a SAS data set */
data prdsale_rest ;
set sashelp.prdsale(where=(country="U.S.A.")) ;
run ;
/* Get the WORK library path */
%let myworkpath=%sysfunc(pathname(work)) ;
/* Reference the physical location of that new data set */
filename sasin "&myworkpath/prdsale_rest.sas7bdat" ;
/* Upload it to GCS */
proc http
url="https://storage.googleapis.com/upload/storage/v1/b/demo-gcpdm/o?uploadType=media%nrstr(&)name=data/prdsale_rest.sas7bdat"
oauth_bearer="&GCSTOKEN"
in=sasin ;
headers "Content-type"="application/octet-stream" ;
debug level=1 ;
run ;
NB: Notice the %nrstr macro function to mask the URL “name” parameter during SAS macro compilation.
Use GCS Signed URLs and SAS FILENAME URL
Another way to access files in GCS from SAS is to use GCS Signed URLs in combination with the SAS FILENAME URL feature.
“A signed URL is a URL that provides limited permission and time to make a request. Signed URLs contain authentication information in their query string, allowing users without credentials to perform specific actions on a resource. When you generate a signed URL, you specify a user or service account which must have sufficient permission to make the request that the signed URL will make. After you generate a signed URL, anyone who possesses it, regardless of whether they have a Google account, can use the signed URL to perform specified actions, such as reading an object, within a specified period of time.”
How to generate a GCS Signed URL?
Using gsutil (a utility from the Google Cloud SDK) and a Google Cloud JSON Key File (a key-pair that allows a Google Cloud service account to authenticate), we will be able to generate a signed URL, as shown below:
gsutil signurl -d 30m ./demo_gcpdm_gcp_key.json gs://demo-gcpdm/data/contact_list.csv
This URL will be available for a period of 30 minutes (customizable) and will work for anyone who possesses it. Here is a typical output:
URL HTTP Method Expiration Signed URL
gs://demo-gcpdm/data/contact_list.csv GET 2020-07-22 14:23:28 https://storage.googleapis.com/demo-gcpdm/data/contact_list.csv?x-goog-signature=407.....5f2a&x-goog-algorithm=GOOG4-RSA-SHA256&x-goog-credential=gel-gcpdm%40sas-gelsandbox.iam.gserviceaccount.com%2F20200722%2Fus-east1%2Fstorage%2Fgoog4_request&x-goog-date=20200722T135328Z&x-goog-expires=1800&x-goog-signedheaders=host
Then, we can use that URL in SAS as shown below:
%let signed_url=%nrstr(https://storage.googleapis.com/demo-gcpdm/data/contact_list.csv?x-goog-signature=407.....) ;
filename gcs url "&signed_url" debug ;
/* Dump the file in the log */
data _null_ ;
infile gcs ;
input ;
put _infile_ ;
run ;
We can be smarter if:
- The Google Cloud SDK is installed on the SAS Compute machine
- We have access to a Google Cloud JSON Key File
- We have allowed the SAS Compute Server to run X commands (option allowXCMD=true)
Indeed, we can do this process all at once from SAS:
/* GCS input file */
%let GCSFILE=gs://demo-gcpdm/data/contact_list.csv ;
/* Generate the signed URL in real time from SAS, triggered with the next data step */
filename gsutil pipe "gsutil signurl -d 10m /opt/gcs/demo_gcpdm_gcp_key.json &GCSFILE" ;
/* Parse the output and catch the signed URL */
data _null_ ;
length url $ 256 http_method $ 20 expiration $ 32 signed_url $ 2000 ;
infile gsutil dlm='09'x firstobs=2 ;
input url http_method expiration signed_url ;
call symput("signed_url",strip(signed_url)) ;
run ;
%put %superq(signed_url) ;
filename gcs url "%superq(signed_url)" debug ;
/* Dump the file in the log */
data _null_ ;
infile gcs ;
input ;
put _infile_ ;
run ;
Thanks for reading.