Last stop on the road to Google Cloud Storage. In previous articles, I’ve been exploring various ways to access Google Cloud Storage from SAS and CAS:
Today, I’m going to focus on using Google BigQuery as a middleware between SAS and Google Cloud Storage.
Indeed, Google BigQuery naturally interacts with Google Cloud Storage. It provides very simple ways to inject various file formats from GCS into BigQuery in one command line. The most popular big data file formats are supported (Avro, Parquet, ORC) as well as commodity file formats like CSV and JSON.
And since SAS can access Google BigQuery, SAS can access those GCS resources under the covers.
You might think at first look that it’s a little bit cumbersome, not direct and requires data duplication. Not really. Google BigQuery provides features that are similar to what Athena offers in the AWS world: interactive queries on data files stored in an object storage infrastructure. This is really useful with the adoption of object storage as data lake support.
In Google BigQuery, data files residing in GCS can be imported and thus data is duplicated in BigQuery as TABLES. Or they can simply be referenced as EXTERNAL TABLES, without any duplication. The concept is similar to a view.
The advantage is that if some data files are updated in Google Cloud Storage, querying the EXTERNAL TABLE in BigQuery will take into account the last updates of the files.
The bq CLI utility (part of the GCP’s CLI) is the tool to create EXTERNAL TABLES based on GCS files:
EXTERNAL TABLES are usable like any other object in the BigQuery query editor:
As well as in SAS:
libname sprebq bigquery cred_path="/opt/gcs/gcpdm_gcp_key.json"
project="sas-gelsandbox" schema="gcpdmtest" max_char_len=80 ;
/* Process an External Table */
proc freq data=sprebq.ext_userdata_from_avro ;
tables gender ;
run ;
cas mysession ;
caslib casbq datasource=(srctype="bigquery",credfile="/opt/gcs/gcpdm_gcp_key.json",
project="sas-gelsandbox",schema="gcpdmtest",max_char_len=80,readbuff=32767) libref=casbq ;
/* Load an External Table in CAS */
proc casutil incaslib="casbq" outcaslib="casbq" ;
load casdata="ext_userdata_from_avro" casout="ext_userdata_from_avro" replace ;
quit ;
cas mysession terminate ;
In conclusion, SAS can (also) access data files in Google Cloud Storage using SAS/ACCESS to BigQuery. This way enables SAS to support major big data file formats (Avro, Parquet, ORC) including formats that SAS does not support directly (Avro).
Thanks for reading.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.