BookmarkSubscribeRSS Feed

SAS Viya accessing S3 with EKS Service Account

Started ‎06-16-2022 by
Modified ‎03-13-2024 by
Views 3,022

With SAS Viya 2022.1.1 release, you can access S3 data files using the EKS Service account. This feature provides a better strategy to manage AWS credentials for Viya applications. Instead of creating and distributing AWS credentials to the CAS containers or using the AWS EC2 instance role, you can associate an IAM role with an EKS service account attached to the CAS container to access S3 buckets.

 

This blog post features CAS and Compute-Server accessing S3 data files with an EKS Service Account.

 

EKS Service Account

 

A Kubernetes service account provides an identity for the processes that run in a pod. If a Pod needs access to the AWS services, you can map the service account to an AWS Identity and Access Management identity to grant that AWS service access.

 

CAS access to S3 data with EKS IAM Service Account

 

Let’s discuss the steps required to access S3 datafiles from CAS and Compute Server without providing the AWS Access key. These steps are described with code examples and log output.

 

  • Create an IAM Policy that defines the access to AWS resources.
  • Create an EKS IAM Service account with a customized IAM Policy.
  • Test and verify the EKS IAM Service account in a sample AWS-CLI Pod.
  • Attach EKS Service Account with CAS and Compute Pods.
  • CAS and Compute environment access to S3 datafiles without AWS Access keys.

 

Create an IAM Policy

 

Create an IAM policy with customized privileges to AWS resources and the S3 bucket. You can use a JSON file with the required information to grant access privileges. The following AWS CLI statement creates an IAM Policy.

 

Code:

 

aws iam create-policy \
    --policy-name cas-s3-read-policy \
    --policy-document file://iam-policy.json

 

The following JSON file describes the read access to an S3 bucket.

 

Code:iam-policy.json

 

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ListAndDescribe",
            "Effect": "Allow",
            "Action": [
                "dynamodb:List*",
                "dynamodb:Describe*"
            ],
            "Resource": "arn:aws:dynamodb:*:*:table/YOUR_TABLE"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::utkumadmviya4"
        },
        {
            "Sid": "List",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetObjectVersion"
            ],
            "Resource": "arn:aws:s3:::utkumadmviya4/*"
        }
    ]
}

 

List the IAM policy ARN details and use them in the IAM Service account.

 

Code:

 

aws iam list-policies | grep  cas-s3-read

 

Log:

 

cldlgn05.unx.sas.com> aws iam list-policies | grep  cas-s3-read
            "PolicyName": "cas-s3-read-policy",
            "Arn": "arn:aws:iam::182696677754:policy/cas-s3-read-policy",
cldlgn05.unx.sas.com>


Create EKS IAM Service Account

 

The following AWS CLI describes the creation of the EKS IAM Service account. The value for Service Account (MY_SERVICEACC=) has to be “sas-cas-server” or "sas-programming-runtime" so that AWS adds the right annotation with IAM service for CAS and Compute Pods. If you use any other value for the Service Account name, the CAS and Compute Pods do not get the right value assigned for system variables "AWS_ROLE_ARN=" and "AWS_WEB_IDENTITY_TOKEN_FILE=" .

 

Code:

 

MY_EKSCLUSTER=utkumadmviya4-eks
MY_SERVICEACC=sas-cas-server
MY_NAMESPACE=sasviya4aws
MY_ROLE=cas-s3-read-role
MY_ARNPOLICY="arn:aws:iam::182696677754:policy/cas-s3-read-policy"

## TO Delete IAM Service Account
## eksctl delete iamserviceaccount  --cluster $MY_EKSCLUSTER --namespace $MY_NAMESPACE  --name $MY_SERVICEACC

eksctl create iamserviceaccount \
  --name $MY_SERVICEACC \
  --namespace $MY_NAMESPACE \
  --cluster $MY_EKSCLUSTER \
  --role-name $MY_ROLE \
  --attach-policy-arn $MY_ARNPOLICY \
  --approve \
  --override-existing-serviceaccounts


View the IAM Service account details in standard and YAML format. Notice the annotations in the metadata section.

 

Code:

 

eksctl get iamserviceaccount --cluster $MY_EKSCLUSTER \
--namespace $MY_NAMESPACE 


kubectl get sa $MY_SERVICEACC  -n $MY_NAMESPACE -o yaml

 

Log:

 

cldlgn05.unx.sas.com> eksctl get iamserviceaccount --cluster $MY_EKSCLUSTER \
> --namespace $MY_NAMESPACE
2022-06-14 15:08:14 [ℹ]  eksctl version 0.58.0
2022-06-14 15:08:14 [ℹ]  using region us-east-1
NAMESPACE       NAME            ROLE ARN
sasviya4aws     sas-cas-server  arn:aws:iam::182696677754:role/cas-s3-read-role
cldlgn05.unx.sas.com>



cldlgn05.unx.sas.com> kubectl get sa $MY_SERVICEACC  -n $MY_NAMESPACE -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::182696677754:role/cas-s3-read-role
  creationTimestamp: "2022-06-14T19:08:13Z"
  labels:
    app.kubernetes.io/managed-by: eksctl
  name: sas-cas-server
  namespace: sasviya4aws
  resourceVersion: "165057"
  uid: 4e8f1c9c-4a02-4741-a7b5-bc61e6d2be64
secrets:
- name: sas-cas-server-token-plqvq
cldlgn05.unx.sas.com>

 

Deploy and test an AWS CLI Pod with IAM Service account

 

Deploy an AWS CLI Pod in the K8S cluster in the same namespace to test and verify the access privileges associated with the EKS service account. Verify the access privileges against the S3 bucket from AWS CLI Pod. You can use a yaml file with the required details to deploy an AWS CLI pod with an EKS IAM service account

 

Code: aws-cli-pod.yaml

 

apiVersion: v1
kind: Pod
metadata:
  name: aws-cli
  namespace: sasviya4aws
spec:
  serviceAccountName: sas-cas-server
  containers:
  - name: aws-cli
    image: amazon/aws-cli:latest
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always

 

Start the AWS CLI Pod at the K8S cluster.

 

Code:

 

kubectl apply -f ./aws-cli-pod.yaml

# View the status of the CLI Pod. 
kubectl get pods -n sasviya4aws | grep cli

 

Log:

 

cldlgn05.unx.sas.com> kubectl get pods -n sasviya4aws | grep cli
aws-cli                                                           1/1     Running       0          9s

View the AWS caller identity for aws-cli Pod.

 

Code:

 

kubectl exec -it -n sasviya4aws aws-cli -- aws sts get-caller-identity

 

Log:

 

cldlgn05.unx.sas.com> kubectl exec -it -n sasviya4aws aws-cli -- aws sts get-caller-identity
{
    "UserId": "AROASVCMSCF5GUWBNRMPM:botocore-session-1655234019",
    "Account": "182696677754",
    "Arn": "arn:aws:sts::182696677754:assumed-role/cas-s3-read-role/botocore-session-1655234019"
}
cldlgn05.unx.sas.com>

 

List and verify the access to the S3 bucket from aws-cli Pod with access privileges embedded in the Service Account.

 

Code:

 

kubectl exec -it -n sasviya4aws aws-cli -- aws s3 ls s3://utkumadmviya4

 

Log:

 

cldlgn05.unx.sas.com> kubectl exec -it -n sasviya4aws aws-cli -- aws s3 ls s3://utkumadmviya4
                           PRE ENCKEYdata/
                           PRE data/
                           PRE redshift/
                           PRE userdata/

 

Delete an S3 folder from aws-cli Pod. Note that the IAM policy associated with Service Account does not have the delete permission on the S3 bucket.

 

Code:

 

kubectl exec -it -n sasviya4aws aws-cli -- aws s3 rm s3://utkumadmviya4/redshift

 

Log:

 

cldlgn05.unx.sas.com> kubectl exec -it -n sasviya4aws aws-cli -- aws s3 rm s3://utkumadmviya4/redshift
delete failed: s3://utkumadmviya4/redshift An error occurred (AccessDenied) when calling the DeleteObject operation: Access Denied
command terminated with exit code 1
cldlgn05.unx.sas.com> 

 

Attach EKS Service Account with CAS and Compute Pod

 

Before accessing the S3 bucket with IAM Service Account requires adding the Service Account specification to the CAS and Compute Pods. The following yaml files add the Service Account name to CAS and Compute pods and AWS system variable values from Service Account annotations.

 

Code:  ../.../site-config/cas-server/cas-serviceaccount-s3iam.yaml

 

apiVersion: builtin
kind: PatchTransformer
metadata:
  name: cas-add-serviceaccount-s3iam
patch: |-
   - op: add
     path: /spec/serviceAccountName
     value: sas-cas-server
target:
  group: viya.sas.com
  kind: CASDeployment
  name: .*
  version: v1alpha1

 

Code: ../.../site-config/sas-compute-server/compute-serviceaccount-s3iam.yaml

 

apiVersion: builtin
kind: PatchTransformer
metadata:
  name: compute-add-serviceaccount-s3iam
patch: |-
   - op: add
     path: /template/spec/serviceAccountName
     value: sas-programming-environment
target:
  name: sas-compute-job-config
  version: v1
  kind: PodTemplate

 

Include the patch into cluster deployment manifestations by updating the transformers section in the "kustomization.yaml" file with the yaml file created in the last steps. Build a new site.yaml from updated "kustomization.yaml" and deploy the manifestation to the K8S environment/namespace. Recycle the CAS and Compute pods to get the ARN Role from the EKS Service Account.

 

Code: kustomization.yaml

 

transformers:
........
.....
   - site-config/cas-server/cas-serviceaccount-s3iam.yaml ## To use IAM Service Account
   - site-config/sas-compute-server/compute-serviceaccount-s3iam.yaml ## To use IAM Service account
......
......

 

Code:

 

export NS=sasviya4aws

kustomize build -o site.yaml
kubectl -n ${NS} apply -f site.yaml

# Restart CAS and Compute Pods
kubectl -n ${NS} delete pods -l casoperator.sas.com/server=default
kubectl -n ${NS}  delete pod --selector='app=sas-compute'
kubectl -n ${NS}  delete pod --selector='app=sas-launcher'

 

Check the CAS Pods for the AWS environment variables. It should have the ARN role from the EKS IAM Service account. Look at the following two variable values : AWS_ROLE_ARN= AWS_WEB_IDENTITY_TOKEN_FILE=

 

Code:

 

kubectl exec -n sasviya4aws sas-cas-server-default-controller -- env  | grep AWS

 

Log:

 

cldlgn05.unx.sas.com> kubectl exec -n sasviya4aws sas-cas-server-default-controller -- env  | grep AWS
Defaulted container "cas" out of: cas, sas-backup-agent, sas-consul-agent, sas-certframe (init), sas-config-init (init)
AWS_REGION=us-east-1
AWS_ROLE_ARN=arn:aws:iam::182696677754:role/cas-s3-read-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_DEFAULT_REGION=us-east-1
cldlgn05.unx.sas.com>

 

CAS load from S3 data files

 

With EKS IAM Service account attached to CAS Pods and environment variable AWS_ROLE_ARN and AWS_WEB_IDENTIRY_TOKEN_FILE is assigned from the Service Account. You can use the following code to load CAS from S3 data files. Notice the CASLIB statement does not have AWS Access Key or ARN Role Parameters. The IAM Service Account provides the AWS access credential to the data read process. Following code loads CAS from a single parquet file and a folder containing multiple parquet files into separate tow CAS tables.

 

Code:

 

%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_region="US_East";
%let objpath="/data/";

CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);

caslib AWSCAS2 datasource=(srctype="s3",
region=&aws_region,
bucket=&s3bucket,
objectpath=&objpath
) subdirs ;

proc casutil incaslib="AWSCAS2" outcaslib="AWSCAS2";
list files ;
run;quit;

/* Load CAS from single Parquet data file */
proc casutil incaslib="AWSCAS2"  outcaslib="public";
    droptable casdata="baseball_prqt_1" incaslib="public"  quiet;
   	load casdata="PARQUET/baseball_prqt/baseball_prqt_1" casout="baseball_prqt_1" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
run;
quit;


/* Load CAS from Parquet data-folder containing multiple files */
proc casutil incaslib="AWSCAS2"  outcaslib="public";
    droptable casdata="baseball_prqt" incaslib="public"  quiet;
   	load casdata="PARQUET/baseball_prqt" casout="baseball_prqt" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
    list tables incaslib="public"; 
run;
quit;

CAS mySession TERMINATE;

 

Log extract:

 

…………
……..
80   %let userid=utkuma;
81   %let s3bucket=&userid.dmviya4 ;
82   %let aws_region="US_East";
83   %let objpath="/data/";
84   
85   CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
NOTE: The session MYSESSION connected successfully to Cloud Analytic Services sas-cas-server-default-client using port 5570. The 
      UUID is e230e9fa-dd38-2147-b9c2-a3acdbf303f4. The user is viya_admin and the active caslib is CASUSER(viya_admin).
NOTE: The SAS option SESSREF was updated with the value MYSESSION.
NOTE: The SAS macro _SESSREF_ was updated with the value MYSESSION.
NOTE: The session is using 3 workers.
NOTE: 'CASUSER(viya_admin)' is now the active caslib.
NOTE: Action 'sessionProp.setSessOpt' used (Total process time):
NOTE: The CAS statement request to update one or more session options for session MYSESSION completed.
86   
87   caslib AWSCAS2 datasource=(srctype="s3",
88   region=&aws_region,
89   bucket=&s3bucket,
90   objectpath=&objpath
91   ) subdirs ;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AWSCAS2' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AWSCAS2'.
…………
……..
97   /* Load CAS from single Parquet data file */
98   proc casutil incaslib="AWSCAS2"  outcaslib="public";
NOTE: The UUID 'e230e9fa-dd38-2147-b9c2-a3acdbf303f4' is connected using session MYSESSION.
99       droptable casdata="baseball_prqt_1" incaslib="public"  quiet;
NOTE: Executing action 'table.dropTable'.
NOTE: Action 'table.dropTable' used (Total process time):
NOTE: The Cloud Analytic Services server processed the request in 0.018944 seconds.
100     
100!     load casdata="PARQUET/baseball_prqt/baseball_prqt_1" casout="baseball_prqt_1" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file PARQUET/baseball_prqt/baseball_prqt_1 available as table BASEBALL_PRQT_1 in caslib 
      public.
…………
……..
104  
105  /* Load CAS from Parquet data-folder containing multiple files */
106  proc casutil incaslib="AWSCAS2"  outcaslib="public";
NOTE: The UUID 'e230e9fa-dd38-2147-b9c2-a3acdbf303f4' is connected using session MYSESSION.
107      droptable casdata="baseball_prqt" incaslib="public"  quiet;
NOTE: Executing action 'table.dropTable'.
NOTE: Action 'table.dropTable' used (Total process time):
NOTE: The Cloud Analytic Services server processed the request in 0.018659 seconds.
108     
108!     load casdata="PARQUET/baseball_prqt" casout="baseball_prqt" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file PARQUET/baseball_prqt available as table BASEBALL_PRQT in caslib public.
………….
…………………….
..............

 

Result Output: CAS table loaded from S3 data files.

 

uk_1_SAS_Viya_Accessing_S3_with_EKS_ServiceAccount_1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Compute Server (SPRE) Access to S3 datafiles

 

With EKS IAM Service account attached to Compute Pod and environment variable AWS_ROLE_ARN and AWS_WEB_IDENTIRY_TOKEN_FILE assigned from Service Account. You can use the following code to manage the S3 bucket from Compute (SPRE) server. Notice the Proc S3 statement does not have AWS credential parameters. The IAM Service Account provides the AWS access credential to access the S3 bucket.

 

Code:

 

%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_region="useast";


PROC S3 REGION=&aws_region;
list  "/&s3bucket/data" ;
run;

 

Log extract:

 

…………
……..
80   %let userid=utkuma;
81   %let s3bucket=&userid.dmviya4 ;
82   %let aws_region="useast";
83   
84   
85   PROC S3 REGION=&aws_region;
86   list  "/&s3bucket/data" ;
87   run;
CSV/      0   
IMAGE/    0   
PARQUET/  0   
SAS7BDAT/ 0   
VIDEO/    0   
NOTE: PROCEDURE S3 used (Total process time):
      real time           0.38 seconds
      cpu time            0.04 seconds
      
88   
…………
……..........

 

 

Thank you to Chuck Hunley and Merri Jensen for their assistance with this Viya 4 functionality.

 

Important Link:

 

S3 Data Source

 

Kubernetes service account

 

IAM roles for service account 

Find more articles from SAS Global Enablement and Learning here.

Comments

Hi @UttamKumar 

 

Very useful!

For this to work do we need to set the "hop limit" on the EKS managed nodes metadata to 2?

 

Thanks,

Eyal

@EyalGonen  It depends.  If viya-iac-aws is used to deploy the cluster, then it sets

  • metadata_http_tokens : required
  • metadata_http_put_response_hop_limit : 1

These options block both IMDSv1 and IMDSv2 from being used with our S3 interface.

To use IMDSv2, the hop limit has to be set to 2.  If you want to use IMDSv1, not recommended, then metadata_http_tokens should be set to optional. 

https://docs.aws.amazon.com/cli/latest/reference/ec2/modify-instance-metadata-options.html

Hi @ChuckHunley 

 

So if use viya4-iac-aws with the default settings, then the instructions in this post will work or not?

 

Thanks!

@EyalGonen Yes, these instructions will work without needing to increase the hop limit.

Wow, this is a game-changer! Thanks for sharing this detailed guide! 

Version history
Last update:
‎03-13-2024 04:12 PM
Updated by:

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags