This blog is the second installment of a series where we discuss various use-cases around the benefits of using Google Workload Federation (WIF) with SAS Viya. See here for the first installment of the blog where we shared details on how WIF works and how it can be leveraged to provide access to a Google SQL PostgreSQL database server. In this blog, we’ll discuss more advanced use-cases like enabling access to Google Big Query and Google Cloud Storage. Let’s get started.
The SAS Viya platform uses the SAS/Access engine to Big Query to access data that is stored there. The engine allows you to use a Google service account to access data that resides in Big Query. This is accomplished by referring to a service account key in json file format via the CREDFILE libname option. See the Access engine documentation for more information on this option. The SAS/Access engine uses the credentials contained in the service account key json file to authenticate against Google Big Query.
The above method of authentication comes with some disadvantages. It’s using long-lived credentials and if somebody can get access to this file they get access to the data. Another disadvantage is that this method of authentication doesn’t support user- or group-based access controls based on the current user. It’s nice to have a paper trail of who has accessed which data and be able to apply fine-grained access controls to data based on actual users or group memberships. Ideally, we want users that have logged on to the SAS Viya platform to be able to seamlessly access data in Google Big Query without providing additional credentials.
This is where Google Workload Identity Federation (WIF) comes into play. WIF is a mechanism that allows applications running outside of Google Cloud to authenticate and access Google Cloud resources without relying on long-lived credentials like service account keys. Trust is established between an external identity provider like MS Entra ID and Google Cloud through the usage of WIF. That allows federated identities to exchange their token for a Google Cloud token and access specific Google services that the identity is allowed to access.
The solution exists out of three main building blocks which will be discussed in more detail during the remainder of this blog. These building blocks are:
In a simplistic manner, the resulting “triangle” looks like this:
To enable Single Sign-on you need to create an MS Entra ID application registration and make some configuration changes to SAS Viya. This is outside of the scope of this blog post. You can find more information on how to set this up in our official documentation here.
The important thing is that this is an essential building block that should be in place before proceeding with the next steps in this blog.
Two things to check in the configuration before moving on to implement the example discussed in this blog:
To check if email is part of the claim, you can simply go to the Azure Portal. Then select Microsoft Entra ID.
On the left side of your screen, you will find the menu shown here on the image. Expand the item Manage. Under Manage you will find several sub-items. Select App Registration.
This is the place where your MS Entra ID application registration resides. Select the registration that is used in the SAS Viya platform to provide SSO.
Once the registration has been selected you can confirm if email is part of the claim by selecting Manage --> Token configuration
This will show the current setup of the claim on the screen
Confirm if email is part of the claim. If not, please revisit the instructions on setting up SSO for SAS Viya to add it. This is also where you can modify the claim and add additional attributes that can be used in securing your data in Google Big Query through WIF.
Later in this blog this will be discussed in more detail.
Two components need to be created to set up Workload Identity Federation on the Google side. Make sure you have the proper permissions to create these components. You can either use the Google Portal or the Google CLI, gcloud, for these steps. We’ll show you the CLI commands in this blog.
Workload Identity Federation pool
To create the Workload Identity Federation pool execute the following statements. Modify accordingly to match your situation.
export pool=<PREFIX>-entraid-pool
gcloud iam workload-identity-pools create $pool \
--location="global" \
--description="test with identity pool" \
--display-name="$pool"
Workload Identity Federation provider
Once the Workload Identity pool has been created, we can create the provider within the pool. Modify accordingly to make sure all placeholders match your situation.
export provider=<PREFIX>-entraid-provider
export entra_appID=<ENTRA-APP-ID>
gcloud iam workload-identity-pools providers create-oidc $provider \
--location="global" \
--workload-identity-pool="$pool" \
--issuer-uri="https://login.microsoftonline.com/<TENTANT-ID>/v2.0" \
--allowed-audiences="$entra_appID" \
--attribute-mapping="google.subject=assertion.email"
There are a couple of things to point out here:
Setup permissions on Google Big Query
Now that all required components for Workload Identity Federation are in place, permissions can be assigned to either a database or a table within Big Query through the usage of principals. A principal is basically a mapping between the federated identity and a Cloud IAM role and the object (a database or a table) within Google Cloud (Big Query for instance).
Let’s break down the below example of a principal and explain some of the components
principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/POOL_ID/subject/SUBJECT
In the next paragraph you will find an example of what that looks like when you assign a federated identity the Big Query Viewer Cloud IAM role through a principal on a Big Query table.
An example of assigning a principal to a role for a Big Query table
To assign permissions to a table in BigQuery, open the interface through the browser, select a database and then select a table. Click on the share button and this will open the window shown below.
Select add principal and then provide the principal.
Then assign a role to that principal.
Creating credential configuration
For the SAS/Access engine to Big Query to use WIF a file will be created that contains information about the Workload Identity Federation pool. The information in the file also tells WIF where it can find the token and which Google endpoint it should use to swap a token for a Google Cloud token.
This file is called credential configuration and is in json format. This file will be referenced in the libname statement for Big Query using the credfile option. That’s why this file needs to be stored somewhere that is accessible by compute sessions in the SAS Viya platform.
Use the code below to generate the file:
project_number=$(gcloud projects describe $(gcloud config get-value core/project) --format=value\(projectNumber\))
pool=<ID OF POOL>
provider=<ID OF PROVIDER>
cat <<EOF> /tmp/credentials-config.json
{
"universe_domain": "googleapis.com",
"type": "external_account",
"audience": "//iam.googleapis.com/projects/$project_number/locations/global/workloadIdentityPools/$pool/providers/$provider",
"subject_token_type": "urn:ietf:params:oauth:token-type:jwt",
"token_url": "https://sts.googleapis.com/v1/token",
"credential_source": {
"file": "/tmp/token.json"
}
}
EOF
Testing the connection
Here’s a sample SAS code to test the connection to Big Query using WIF.
/* Macro that writes out the ID token to the compute session temp directory. This token is used to swap the token for a Google Cloud Token */
%get_identity_token;
/* The credfile option points to the credentials config json file which tells the SAS Access engine how it can retrieve the Google token and which toke to use */
options sastrace='d,d,d,d' sastraceloc=saslog nostsuffix msglevel=i ;
libname gbqlib2 BigQuery project="bq_data" schema="HMEQ" credfile='/homes/&_CLIENTUSERID/credentials-config.json';
data test;
set gbqlib2.ID_HMEQ(obs=1);
run;
Let’s look at the code in more detail:
To explain the concept of Workload Identity Federation and accessing data in Big Query from SAS Viya an example was given in which the data is secured using an email address. In real life situations, the usage of groups is preferred instead of using email addresses to secure data. The security model requires less maintenance that way!
Can this be done using WIF? The short answer is yes!
To use groups from an external Identity provider with Workload Identity Federation three things need to be modified.
Adding the groups claim to the MS Entra ID application
Adding the groups claim to the MS Entra ID application can be done through the Azure portal. You need to have owners’ permission to be able to make these modifications.
Groups claim can be added by selecting manage à token configuration and then clicking on add group claim.
Now you are prompted to select which type of group is included in the claim. Please consult an MS Entra ID admin to determine which type is the proper type for your organization as this might differ from the above screenshot.
Regarding the token properties sAMAccountName is used, as this provides the names of the groups that are used in the principal. But there are more options. Again, discuss this with an MS Entra ID admin to determine the correct attribute to select in your organization’s setup.
Modifying the attribute mapping of the WIF provider
Modifying the attribute mapping is straightforward. Assuming that all the components are still in place after getting the provided example in this blog working, you can simply modify the provider through the Google cloud console as shown below.
Assigning permissions for a group to a table in Big Query
Once the previous modification is in place, you can start assigning permissions to a table in Big Query. Follow the steps as described previously in this blog with one exception in the definition of the principal.
Other than that, the string is similar as to what we discussed earlier in the blog. Assign a role and apply the principalSet to the table.
The code that was used earlier to test our setup doesn’t require any modifications. Use that code to validate that you can access the data that resides in Big Query.
Let’s move on to the final use-case we want to talk about in this blog: accessing data which is stored on Google’s object storage solution, GCS. The requirements for this scenario are the same as for Big Query: we want to avoid using passwords in our codes and we want to be able to use user- or group-based access controls to protect the data.
While SAS in general can access data on GCS by using PROC S3 or a FILENAME statements, the use of WIF complicates things because it is a proprietary IAM framework of Google cloud and not directly supported by these approaches. Rather, we need to look at the REST API offered by GCS, which offers the usual HTTP verbs like GET, PUT, DELETE and HEAD etc. for manipulating the contents of your buckets.
If you have followed our blog so far, you already know that WIF uses Google’s STS service to swap out the OIDC ID token we can offer for an access token that is understood by other Google services. Using it as our Bearer token, we can then make a call to the GCS REST API to perform our desired action, e.g. download a file from a bucket. Both steps can be performed using PROC HTTP.
Here’s a sample SAS program to show how this works.
/* Macro that writes out the ID token to the compute session temp directory. This token is used to swap the token for a Google Cloud Token */
%get_identity_token;
/* Ask Google STS to swap the ID token for an access_token */
filename resp2 temp;
proc http
method=POST
out=resp2
url=https://sts.googleapis.com/v1/token
query=(
"audience"="//iam.googleapis.com/projects/$project_number/
locations/global/workloadIdentityPools/$pool/providers/$provider",
"grant_type"="urn:ietf:params:oauth:grant-type:token-exchange"
"requested_token_type"="urn:ietf:params:oauth:token-type:access_token"
"scope"="https://www.googleapis.com/auth/cloud-platform"
"subject_token_type"="urn:ietf:params:oauth:token-type:jwt"
"subject_token"="&my_azureToken")
;
run;
libname output2 json fileref=resp2;
data _null_;
set output2.root;
call symput('my_gcpToken',access_token);
run;
/* REST call to download from the bucket using the access_token */
filename out TEMP;
%let BUCKET_NAME=my-bucket;
%let OBJECT_NAME=data.csv;
proc http
url=https://storage.googleapis.com/storage/v1/b/&BUCKET_NAME./o/
&OBJECT_NAME.?alt=media
method="GET" out=out
OAUTH_BEARER="&my_gcpToken.";
run;
proc import datafile=out out=mydata dbms=csv replace;
getnames=yes;
run;
proc print data=mydata; run;
Keep in mind that you need to have completed the WIF and MS Entra ID configuration as described before.
Workload Identity Federation allows you to use a federated identity to access and secure data that is stored within your Big Query database or a Google Cloud Storage Bucket. It removes the need of using long-lived credentials like a Google service account to access your data. That eliminates a big security risk. Another welcome feature of WIF is that it enables you to apply access controls to data using either user or group information from an external identity provider. That eliminates the need to manage a separate security model in Google Cloud IAM.
One final question remains: are you going to start using WIF to access data in Big Query and Google Cloud Storage Buckets from SAS Viya?
Let us know in the comments!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.