BookmarkSubscribeRSS Feed

Accessing files on Google Cloud Storage (GCS) using REST

Started ‎08-04-2020 by
Modified ‎08-04-2020 by
Views 10,874

In a previous article, I wrote about the current possibilities available to access files stored in Google’s object storage implementation: Google Cloud Storage (GCS).

 

Let's dive deeper and see how we can access files in GCS using REST from SAS.

 

The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage: “Cloud Storage JSON API v1”. We will use this API to access GCS files from SAS.

Prerequisite: get an authorization access token

In order to use the “Cloud Storage JSON API v1” through REST, we need an authorization access token. Google provides a useful playground online to generate such access tokens. The access token usually lasts 3600 seconds (1 hour).

Cloud Storage API Endpoint

In the playground, you might have noticed that the endpoint to access a file looks like the following:

 

https://storage.googleapis.com/storage/v1/b/{bucket}/o/{object}

 

where:

  • {bucket} is the GCS bucket where the file is to be found
  • {object} is the name of the file, correctly encoded (the “/” must be encoded)

NB: Objects do not reside within subdirectories in a bucket, but rather reside in a flat namespace. So, the file “data/contact_list.csv” makes it appear as a file named “contact_list.csv” in a folder “data” but in reality, its actual file name is “data/contact_list.csv”.

 

To download the file, we will modify the endpoint as shown below (notice the %2F to encode the “/” character and the addition of “?alt=media” to trigger the download of the file):

 

https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fcontact_list.csv?alt=media

Download a CSV file from GCS using SAS PROC HTTP

Now that we have an authorization token and a correct endpoint, we can proceed from SAS to download a file from GCS:

 

/* Store token */
%let GCSTOKEN=ya29.a0AfH6SMCxmjHFa.....SlqVw2_JEggJIhM ;

/* Temporary fileref */
filename outcsv temp ;

/* Download the file using the REST API */
proc http
    url="https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fcontact_list.csv?alt=media"
    oauth_bearer="&GCSTOKEN"
    out=outcsv ;
    debug level=1 ;
run ;

 

We can check the log for a “HTTP/1.1 200 OK” message.

 

Also, we can display the file in the log to check the contents:

 

/* Dump the output file in the log */
data _null_ ;
    infile outcsv ;
    input ;
    put _infile_ ;
run ;

 

We can then use the file however we want, load it in CAS for instance:

 

/* Load the file in CAS */

cas mysession ;

proc casutil outcaslib="public" ;
    load file=outcsv importoptions=(filetype="csv") casout="contact_list" replace ;
quit ;

Download a SAS Data Set from GCS using SAS PROC HTTP

Likewise, we can download a SAS data set:

 

/* Get the WORK library path */
%let myworkpath=%sysfunc(pathname(work)) ;

/* The file will be accessible in the WORK library */
filename outsas "&myworkpath/hmeq.sas7bdat" ;

/* Download the SAS data set using the REST API */
proc http
    url="https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fhmeq.sas7bdat?alt=media"
    oauth_bearer="&GCSTOKEN"
    out=outsas ;
    debug level=1 ;
run ;

/* Load the SAS data set in CAS */

cas mysession ;

proc casutil outcaslib="public" ;
    load data=work.hmeq casout="hmeq" replace ;
quit ;

Upload files to GCS from SAS using SAS PROC HTTP

Interacting with GCS using the REST API is not one-way only. We can upload files as well. The endpoint for uploading files is as follows:

 

https://storage.googleapis.com/upload/storage/v1/b/demo-gcpdm/o?uploadType=media&name=data/class_res...

 

NB: Encoding the “/” in the file name is not needed in this particular case (parameter).

 

Example:

 

/* Upload a CSV file */

/* Temporary fileref */
filename csvin temp ;

/* Create a CSV file from a SAS data set */
proc export data=sashelp.class
     outfile=csvin
     dbms=csv
     replace ;
run ;

/* Upload it to GCS */
proc http
    url="https://storage.googleapis.com/upload/storage/v1/b/demo-gcpdm/o?uploadType=media%nrstr(&)name=data/class_rest.csv"
    oauth_bearer="&GCSTOKEN"
    in=csvin ;
    headers "Content-type"="text/csv" ;
    debug level=1 ;
run ;

/* Upload a SAS data set */

/* Create a SAS data set */
data prdsale_rest ;
    set sashelp.prdsale(where=(country="U.S.A.")) ;
run ;

/* Get the WORK library path */
%let myworkpath=%sysfunc(pathname(work)) ;

/* Reference the physical location of that new data set */
filename sasin "&myworkpath/prdsale_rest.sas7bdat" ;

/* Upload it to GCS */
proc http
    url="https://storage.googleapis.com/upload/storage/v1/b/demo-gcpdm/o?uploadType=media%nrstr(&)name=data/prdsale_rest.sas7bdat"
    oauth_bearer="&GCSTOKEN"
    in=sasin ;
    headers "Content-type"="application/octet-stream" ;
    debug level=1 ;
run ;

 

NB: Notice the %nrstr macro function to mask the URL “name” parameter during SAS macro compilation.

Use GCS Signed URLs and SAS FILENAME URL

Another way to access files in GCS from SAS is to use GCS Signed URLs in combination with the SAS FILENAME URL feature.

 

“A signed URL is a URL that provides limited permission and time to make a request. Signed URLs contain authentication information in their query string, allowing users without credentials to perform specific actions on a resource. When you generate a signed URL, you specify a user or service account which must have sufficient permission to make the request that the signed URL will make. After you generate a signed URL, anyone who possesses it, regardless of whether they have a Google account, can use the signed URL to perform specified actions, such as reading an object, within a specified period of time.”

 

How to generate a GCS Signed URL?

 

Using gsutil (a utility from the Google Cloud SDK) and a Google Cloud JSON Key File (a key-pair that allows a Google Cloud service account to authenticate), we will be able to generate a signed URL, as shown below:

 

gsutil signurl -d 30m ./demo_gcpdm_gcp_key.json gs://demo-gcpdm/data/contact_list.csv

 

This URL will be available for a period of 30 minutes (customizable) and will work for anyone who possesses it. Here is a typical output:

 

URL	HTTP Method	Expiration	Signed URL
gs://demo-gcpdm/data/contact_list.csv	GET	2020-07-22 14:23:28	https://storage.googleapis.com/demo-gcpdm/data/contact_list.csv?x-goog-signature=407.....5f2a&x-goog-algorithm=GOOG4-RSA-SHA256&x-goog-credential=gel-gcpdm%40sas-gelsandbox.iam.gserviceaccount.com%2F20200722%2Fus-east1%2Fstorage%2Fgoog4_request&x-goog-date=20200722T135328Z&x-goog-expires=1800&x-goog-signedheaders=host

 

Then, we can use that URL in SAS as shown below:

 

%let signed_url=%nrstr(https://storage.googleapis.com/demo-gcpdm/data/contact_list.csv?x-goog-signature=407.....) ;

filename gcs url "&signed_url" debug ;

/* Dump the file in the log */
data _null_ ;
    infile gcs ;
    input ;
    put _infile_ ;
run ;

 

We can be smarter if:

  • The Google Cloud SDK is installed on the SAS Compute machine
  • We have access to a Google Cloud JSON Key File
  • We have allowed the SAS Compute Server to run X commands (option allowXCMD=true)

Indeed, we can do this process all at once from SAS:

 

/* GCS input file */
%let GCSFILE=gs://demo-gcpdm/data/contact_list.csv ;

/* Generate the signed URL in real time from SAS, triggered with the next data step */
filename gsutil pipe "gsutil signurl -d 10m /opt/gcs/demo_gcpdm_gcp_key.json &GCSFILE" ;

/* Parse the output and catch the signed URL */
data _null_ ;
    length url $ 256 http_method $ 20 expiration $ 32 signed_url $ 2000 ;
    infile gsutil dlm='09'x firstobs=2 ;
    input url http_method expiration signed_url ;
    call symput("signed_url",strip(signed_url)) ;
run ;

%put %superq(signed_url) ;

filename gcs url "%superq(signed_url)" debug ;

/* Dump the file in the log */
data _null_ ;
    infile gcs ;
    input ;
    put _infile_ ;
run ;

 

Thanks for reading.

Version history
Last update:
‎08-04-2020 05:38 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags