In a previous article, I wrote about the current possibilities available to access files stored in Google’s object storage implementation: Google Cloud Storage (GCS).
Let's dive deeper and see how we can access files in GCS using REST from SAS.
The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage: “Cloud Storage JSON API v1”. We will use this API to access GCS files from SAS.
In order to use the “Cloud Storage JSON API v1” through REST, we need an authorization access token. Google provides a useful playground online to generate such access tokens. The access token usually lasts 3600 seconds (1 hour).
In the playground, you might have noticed that the endpoint to access a file looks like the following:
https://storage.googleapis.com/storage/v1/b/{bucket}/o/{object}
where:
NB: Objects do not reside within subdirectories in a bucket, but rather reside in a flat namespace. So, the file “data/contact_list.csv” makes it appear as a file named “contact_list.csv” in a folder “data” but in reality, its actual file name is “data/contact_list.csv”.
To download the file, we will modify the endpoint as shown below (notice the %2F to encode the “/” character and the addition of “?alt=media” to trigger the download of the file):
https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fcontact_list.csv?alt=media
Now that we have an authorization token and a correct endpoint, we can proceed from SAS to download a file from GCS:
/* Store token */
%let GCSTOKEN=ya29.a0AfH6SMCxmjHFa.....SlqVw2_JEggJIhM ;
/* Temporary fileref */
filename outcsv temp ;
/* Download the file using the REST API */
proc http
url="https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fcontact_list.csv?alt=media"
oauth_bearer="&GCSTOKEN"
out=outcsv ;
debug level=1 ;
run ;
We can check the log for a “HTTP/1.1 200 OK” message.
Also, we can display the file in the log to check the contents:
/* Dump the output file in the log */
data _null_ ;
infile outcsv ;
input ;
put _infile_ ;
run ;
We can then use the file however we want, load it in CAS for instance:
/* Load the file in CAS */
cas mysession ;
proc casutil outcaslib="public" ;
load file=outcsv importoptions=(filetype="csv") casout="contact_list" replace ;
quit ;
Likewise, we can download a SAS data set:
/* Get the WORK library path */
%let myworkpath=%sysfunc(pathname(work)) ;
/* The file will be accessible in the WORK library */
filename outsas "&myworkpath/hmeq.sas7bdat" ;
/* Download the SAS data set using the REST API */
proc http
url="https://storage.googleapis.com/storage/v1/b/demo-gcpdm/o/data%2Fhmeq.sas7bdat?alt=media"
oauth_bearer="&GCSTOKEN"
out=outsas ;
debug level=1 ;
run ;
/* Load the SAS data set in CAS */
cas mysession ;
proc casutil outcaslib="public" ;
load data=work.hmeq casout="hmeq" replace ;
quit ;
Interacting with GCS using the REST API is not one-way only. We can upload files as well. The endpoint for uploading files is as follows:
NB: Encoding the “/” in the file name is not needed in this particular case (parameter).
Example:
/* Upload a CSV file */
/* Temporary fileref */
filename csvin temp ;
/* Create a CSV file from a SAS data set */
proc export data=sashelp.class
outfile=csvin
dbms=csv
replace ;
run ;
/* Upload it to GCS */
proc http
url="https://storage.googleapis.com/upload/storage/v1/b/demo-gcpdm/o?uploadType=media%nrstr(&)name=data/class_rest.csv"
oauth_bearer="&GCSTOKEN"
in=csvin ;
headers "Content-type"="text/csv" ;
debug level=1 ;
run ;
/* Upload a SAS data set */
/* Create a SAS data set */
data prdsale_rest ;
set sashelp.prdsale(where=(country="U.S.A.")) ;
run ;
/* Get the WORK library path */
%let myworkpath=%sysfunc(pathname(work)) ;
/* Reference the physical location of that new data set */
filename sasin "&myworkpath/prdsale_rest.sas7bdat" ;
/* Upload it to GCS */
proc http
url="https://storage.googleapis.com/upload/storage/v1/b/demo-gcpdm/o?uploadType=media%nrstr(&)name=data/prdsale_rest.sas7bdat"
oauth_bearer="&GCSTOKEN"
in=sasin ;
headers "Content-type"="application/octet-stream" ;
debug level=1 ;
run ;
NB: Notice the %nrstr macro function to mask the URL “name” parameter during SAS macro compilation.
Another way to access files in GCS from SAS is to use GCS Signed URLs in combination with the SAS FILENAME URL feature.
“A signed URL is a URL that provides limited permission and time to make a request. Signed URLs contain authentication information in their query string, allowing users without credentials to perform specific actions on a resource. When you generate a signed URL, you specify a user or service account which must have sufficient permission to make the request that the signed URL will make. After you generate a signed URL, anyone who possesses it, regardless of whether they have a Google account, can use the signed URL to perform specified actions, such as reading an object, within a specified period of time.”
How to generate a GCS Signed URL?
Using gsutil (a utility from the Google Cloud SDK) and a Google Cloud JSON Key File (a key-pair that allows a Google Cloud service account to authenticate), we will be able to generate a signed URL, as shown below:
gsutil signurl -d 30m ./demo_gcpdm_gcp_key.json gs://demo-gcpdm/data/contact_list.csv
This URL will be available for a period of 30 minutes (customizable) and will work for anyone who possesses it. Here is a typical output:
URL HTTP Method Expiration Signed URL
gs://demo-gcpdm/data/contact_list.csv GET 2020-07-22 14:23:28 https://storage.googleapis.com/demo-gcpdm/data/contact_list.csv?x-goog-signature=407.....5f2a&x-goog-algorithm=GOOG4-RSA-SHA256&x-goog-credential=gel-gcpdm%40sas-gelsandbox.iam.gserviceaccount.com%2F20200722%2Fus-east1%2Fstorage%2Fgoog4_request&x-goog-date=20200722T135328Z&x-goog-expires=1800&x-goog-signedheaders=host
Then, we can use that URL in SAS as shown below:
%let signed_url=%nrstr(https://storage.googleapis.com/demo-gcpdm/data/contact_list.csv?x-goog-signature=407.....) ;
filename gcs url "&signed_url" debug ;
/* Dump the file in the log */
data _null_ ;
infile gcs ;
input ;
put _infile_ ;
run ;
We can be smarter if:
Indeed, we can do this process all at once from SAS:
/* GCS input file */
%let GCSFILE=gs://demo-gcpdm/data/contact_list.csv ;
/* Generate the signed URL in real time from SAS, triggered with the next data step */
filename gsutil pipe "gsutil signurl -d 10m /opt/gcs/demo_gcpdm_gcp_key.json &GCSFILE" ;
/* Parse the output and catch the signed URL */
data _null_ ;
length url $ 256 http_method $ 20 expiration $ 32 signed_url $ 2000 ;
infile gsutil dlm='09'x firstobs=2 ;
input url http_method expiration signed_url ;
call symput("signed_url",strip(signed_url)) ;
run ;
%put %superq(signed_url) ;
filename gcs url "%superq(signed_url)" debug ;
/* Dump the file in the log */
data _null_ ;
infile gcs ;
input ;
put _infile_ ;
run ;
Thanks for reading.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.