With SAS Viya 2022.1.1 release, you can access S3 data files using the EKS Service account. This feature provides a better strategy to manage AWS credentials for Viya applications. Instead of creating and distributing AWS credentials to the CAS containers or using the AWS EC2 instance role, you can associate an IAM role with an EKS service account attached to the CAS container to access S3 buckets.
This blog post features CAS and Compute-Server accessing S3 data files with an EKS Service Account.
A Kubernetes service account provides an identity for the processes that run in a pod. If a Pod needs access to the AWS services, you can map the service account to an AWS Identity and Access Management identity to grant that AWS service access.
Let’s discuss the steps required to access S3 datafiles from CAS and Compute Server without providing the AWS Access key. These steps are described with code examples and log output.
Create an IAM policy with customized privileges to AWS resources and the S3 bucket. You can use a JSON file with the required information to grant access privileges. The following AWS CLI statement creates an IAM Policy.
Code:
aws iam create-policy \
--policy-name cas-s3-read-policy \
--policy-document file://iam-policy.json
The following JSON file describes the read access to an S3 bucket.
Code:iam-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListAndDescribe",
"Effect": "Allow",
"Action": [
"dynamodb:List*",
"dynamodb:Describe*"
],
"Resource": "arn:aws:dynamodb:*:*:table/YOUR_TABLE"
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::utkumadmviya4"
},
{
"Sid": "List",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::utkumadmviya4/*"
}
]
}
List the IAM policy ARN details and use them in the IAM Service account.
Code:
aws iam list-policies | grep cas-s3-read
Log:
cldlgn05.unx.sas.com> aws iam list-policies | grep cas-s3-read
"PolicyName": "cas-s3-read-policy",
"Arn": "arn:aws:iam::182696677754:policy/cas-s3-read-policy",
cldlgn05.unx.sas.com>
The following AWS CLI describes the creation of the EKS IAM Service account. The value for Service Account (MY_SERVICEACC=) has to be “sas-cas-server” or "sas-programming-environment" so that AWS adds the right annotation with IAM service for CAS and Compute Pods. If you use any other value for the Service Account name, the CAS and Compute Pods do not get the right value assigned for system variables "AWS_ROLE_ARN=" and "AWS_WEB_IDENTITY_TOKEN_FILE=" .
Code:
MY_EKSCLUSTER=utkumadmviya4-eks
MY_SERVICEACC=sas-cas-server
MY_NAMESPACE=sasviya4aws
MY_ROLE=cas-s3-read-role
MY_ARNPOLICY="arn:aws:iam::182696677754:policy/cas-s3-read-policy"
## TO Delete IAM Service Account
## eksctl delete iamserviceaccount --cluster $MY_EKSCLUSTER --namespace $MY_NAMESPACE --name $MY_SERVICEACC
eksctl create iamserviceaccount \
--name $MY_SERVICEACC \
--namespace $MY_NAMESPACE \
--cluster $MY_EKSCLUSTER \
--role-name $MY_ROLE \
--attach-policy-arn $MY_ARNPOLICY \
--approve \
--override-existing-serviceaccounts
View the IAM Service account details in standard and YAML format. Notice the annotations in the metadata section.
Code:
eksctl get iamserviceaccount --cluster $MY_EKSCLUSTER \
--namespace $MY_NAMESPACE
kubectl get sa $MY_SERVICEACC -n $MY_NAMESPACE -o yaml
Log:
cldlgn05.unx.sas.com> eksctl get iamserviceaccount --cluster $MY_EKSCLUSTER \
> --namespace $MY_NAMESPACE
2022-06-14 15:08:14 [ℹ] eksctl version 0.58.0
2022-06-14 15:08:14 [ℹ] using region us-east-1
NAMESPACE NAME ROLE ARN
sasviya4aws sas-cas-server arn:aws:iam::182696677754:role/cas-s3-read-role
cldlgn05.unx.sas.com>
cldlgn05.unx.sas.com> kubectl get sa $MY_SERVICEACC -n $MY_NAMESPACE -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::182696677754:role/cas-s3-read-role
creationTimestamp: "2022-06-14T19:08:13Z"
labels:
app.kubernetes.io/managed-by: eksctl
name: sas-cas-server
namespace: sasviya4aws
resourceVersion: "165057"
uid: 4e8f1c9c-4a02-4741-a7b5-bc61e6d2be64
secrets:
- name: sas-cas-server-token-plqvq
cldlgn05.unx.sas.com>
Deploy an AWS CLI Pod in the K8S cluster in the same namespace to test and verify the access privileges associated with the EKS service account. Verify the access privileges against the S3 bucket from AWS CLI Pod. You can use a yaml file with the required details to deploy an AWS CLI pod with an EKS IAM service account
Code: aws-cli-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: aws-cli
namespace: sasviya4aws
spec:
serviceAccountName: sas-cas-server
containers:
- name: aws-cli
image: amazon/aws-cli:latest
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
Start the AWS CLI Pod at the K8S cluster.
Code:
kubectl apply -f ./aws-cli-pod.yaml
# View the status of the CLI Pod.
kubectl get pods -n sasviya4aws | grep cli
Log:
cldlgn05.unx.sas.com> kubectl get pods -n sasviya4aws | grep cli
aws-cli 1/1 Running 0 9s
View the AWS caller identity for aws-cli Pod.
Code:
kubectl exec -it -n sasviya4aws aws-cli -- aws sts get-caller-identity
Log:
cldlgn05.unx.sas.com> kubectl exec -it -n sasviya4aws aws-cli -- aws sts get-caller-identity
{
"UserId": "AROASVCMSCF5GUWBNRMPM:botocore-session-1655234019",
"Account": "182696677754",
"Arn": "arn:aws:sts::182696677754:assumed-role/cas-s3-read-role/botocore-session-1655234019"
}
cldlgn05.unx.sas.com>
List and verify the access to the S3 bucket from aws-cli Pod with access privileges embedded in the Service Account.
Code:
kubectl exec -it -n sasviya4aws aws-cli -- aws s3 ls s3://utkumadmviya4
Log:
cldlgn05.unx.sas.com> kubectl exec -it -n sasviya4aws aws-cli -- aws s3 ls s3://utkumadmviya4
PRE ENCKEYdata/
PRE data/
PRE redshift/
PRE userdata/
Delete an S3 folder from aws-cli Pod. Note that the IAM policy associated with Service Account does not have the delete permission on the S3 bucket.
Code:
kubectl exec -it -n sasviya4aws aws-cli -- aws s3 rm s3://utkumadmviya4/redshift
Log:
cldlgn05.unx.sas.com> kubectl exec -it -n sasviya4aws aws-cli -- aws s3 rm s3://utkumadmviya4/redshift
delete failed: s3://utkumadmviya4/redshift An error occurred (AccessDenied) when calling the DeleteObject operation: Access Denied
command terminated with exit code 1
cldlgn05.unx.sas.com>
Before accessing the S3 bucket with IAM Service Account requires adding the Service Account specification to the CAS and Compute Pods. The following yaml files add the Service Account name to CAS and Compute pods and AWS system variable values from Service Account annotations.
Code: ../.../site-config/cas-server/cas-serviceaccount-s3iam.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
name: cas-add-serviceaccount-s3iam
patch: |-
- op: add
path: /spec/serviceAccountName
value: sas-cas-server
target:
group: viya.sas.com
kind: CASDeployment
name: .*
version: v1alpha1
Code: ../.../site-config/sas-compute-server/compute-serviceaccount-s3iam.yaml
apiVersion: builtin
kind: PatchTransformer
metadata:
name: compute-add-serviceaccount-s3iam
patch: |-
- op: add
path: /template/spec/serviceAccountName
value: sas-programming-environment
target:
name: sas-compute-job-config
version: v1
kind: PodTemplate
Include the patch into cluster deployment manifestations by updating the transformers section in the "kustomization.yaml" file with the yaml file created in the last steps. Build a new site.yaml from updated "kustomization.yaml" and deploy the manifestation to the K8S environment/namespace. Recycle the CAS and Compute pods to get the ARN Role from the EKS Service Account.
Code: kustomization.yaml
transformers:
........
.....
- site-config/cas-server/cas-serviceaccount-s3iam.yaml ## To use IAM Service Account
- site-config/sas-compute-server/compute-serviceaccount-s3iam.yaml ## To use IAM Service account
......
......
Code:
export NS=sasviya4aws
kustomize build -o site.yaml
kubectl -n ${NS} apply -f site.yaml
# Restart CAS and Compute Pods
kubectl -n ${NS} delete pods -l casoperator.sas.com/server=default
kubectl -n ${NS} delete pod --selector='app=sas-compute'
kubectl -n ${NS} delete pod --selector='app=sas-launcher'
Check the CAS Pods for the AWS environment variables. It should have the ARN role from the EKS IAM Service account. Look at the following two variable values : AWS_ROLE_ARN= AWS_WEB_IDENTITY_TOKEN_FILE=
Code:
kubectl exec -n sasviya4aws sas-cas-server-default-controller -- env | grep AWS
Log:
cldlgn05.unx.sas.com> kubectl exec -n sasviya4aws sas-cas-server-default-controller -- env | grep AWS
Defaulted container "cas" out of: cas, sas-backup-agent, sas-consul-agent, sas-certframe (init), sas-config-init (init)
AWS_REGION=us-east-1
AWS_ROLE_ARN=arn:aws:iam::182696677754:role/cas-s3-read-role
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_DEFAULT_REGION=us-east-1
cldlgn05.unx.sas.com>
With EKS IAM Service account attached to CAS Pods and environment variable AWS_ROLE_ARN and AWS_WEB_IDENTIRY_TOKEN_FILE is assigned from the Service Account. You can use the following code to load CAS from S3 data files. Notice the CASLIB statement does not have AWS Access Key or ARN Role Parameters. The IAM Service Account provides the AWS access credential to the data read process. Following code loads CAS from a single parquet file and a folder containing multiple parquet files into separate tow CAS tables.
Code:
%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_region="US_East";
%let objpath="/data/";
CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
caslib AWSCAS2 datasource=(srctype="s3",
region=&aws_region,
bucket=&s3bucket,
objectpath=&objpath
) subdirs ;
proc casutil incaslib="AWSCAS2" outcaslib="AWSCAS2";
list files ;
run;quit;
/* Load CAS from single Parquet data file */
proc casutil incaslib="AWSCAS2" outcaslib="public";
droptable casdata="baseball_prqt_1" incaslib="public" quiet;
load casdata="PARQUET/baseball_prqt/baseball_prqt_1" casout="baseball_prqt_1" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
run;
quit;
/* Load CAS from Parquet data-folder containing multiple files */
proc casutil incaslib="AWSCAS2" outcaslib="public";
droptable casdata="baseball_prqt" incaslib="public" quiet;
load casdata="PARQUET/baseball_prqt" casout="baseball_prqt" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
list tables incaslib="public";
run;
quit;
CAS mySession TERMINATE;
Log extract:
…………
……..
80 %let userid=utkuma;
81 %let s3bucket=&userid.dmviya4 ;
82 %let aws_region="US_East";
83 %let objpath="/data/";
84
85 CAS mySession SESSOPTS=( CASLIB=casuser TIMEOUT=99 LOCALE="en_US" metrics=true);
NOTE: The session MYSESSION connected successfully to Cloud Analytic Services sas-cas-server-default-client using port 5570. The
UUID is e230e9fa-dd38-2147-b9c2-a3acdbf303f4. The user is viya_admin and the active caslib is CASUSER(viya_admin).
NOTE: The SAS option SESSREF was updated with the value MYSESSION.
NOTE: The SAS macro _SESSREF_ was updated with the value MYSESSION.
NOTE: The session is using 3 workers.
NOTE: 'CASUSER(viya_admin)' is now the active caslib.
NOTE: Action 'sessionProp.setSessOpt' used (Total process time):
NOTE: The CAS statement request to update one or more session options for session MYSESSION completed.
86
87 caslib AWSCAS2 datasource=(srctype="s3",
88 region=&aws_region,
89 bucket=&s3bucket,
90 objectpath=&objpath
91 ) subdirs ;
NOTE: Executing action 'table.addCaslib'.
NOTE: 'AWSCAS2' is now the active caslib.
NOTE: Cloud Analytic Services added the caslib 'AWSCAS2'.
…………
……..
97 /* Load CAS from single Parquet data file */
98 proc casutil incaslib="AWSCAS2" outcaslib="public";
NOTE: The UUID 'e230e9fa-dd38-2147-b9c2-a3acdbf303f4' is connected using session MYSESSION.
99 droptable casdata="baseball_prqt_1" incaslib="public" quiet;
NOTE: Executing action 'table.dropTable'.
NOTE: Action 'table.dropTable' used (Total process time):
NOTE: The Cloud Analytic Services server processed the request in 0.018944 seconds.
100
100! load casdata="PARQUET/baseball_prqt/baseball_prqt_1" casout="baseball_prqt_1" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file PARQUET/baseball_prqt/baseball_prqt_1 available as table BASEBALL_PRQT_1 in caslib
public.
…………
……..
104
105 /* Load CAS from Parquet data-folder containing multiple files */
106 proc casutil incaslib="AWSCAS2" outcaslib="public";
NOTE: The UUID 'e230e9fa-dd38-2147-b9c2-a3acdbf303f4' is connected using session MYSESSION.
107 droptable casdata="baseball_prqt" incaslib="public" quiet;
NOTE: Executing action 'table.dropTable'.
NOTE: Action 'table.dropTable' used (Total process time):
NOTE: The Cloud Analytic Services server processed the request in 0.018659 seconds.
108
108! load casdata="PARQUET/baseball_prqt" casout="baseball_prqt" IMPORTOPTIONS=(FILETYPE="PARQUET") promote ;
NOTE: Executing action 'table.loadTable'.
NOTE: Cloud Analytic Services made the file PARQUET/baseball_prqt available as table BASEBALL_PRQT in caslib public.
………….
…………………….
..............
Result Output: CAS table loaded from S3 data files.
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
With EKS IAM Service account attached to Compute Pod and environment variable AWS_ROLE_ARN and AWS_WEB_IDENTIRY_TOKEN_FILE assigned from Service Account. You can use the following code to manage the S3 bucket from Compute (SPRE) server. Notice the Proc S3 statement does not have AWS credential parameters. The IAM Service Account provides the AWS access credential to access the S3 bucket.
Code:
%let userid=utkuma;
%let s3bucket=&userid.dmviya4 ;
%let aws_region="useast";
PROC S3 REGION=&aws_region;
list "/&s3bucket/data" ;
run;
Log extract:
…………
……..
80 %let userid=utkuma;
81 %let s3bucket=&userid.dmviya4 ;
82 %let aws_region="useast";
83
84
85 PROC S3 REGION=&aws_region;
86 list "/&s3bucket/data" ;
87 run;
CSV/ 0
IMAGE/ 0
PARQUET/ 0
SAS7BDAT/ 0
VIDEO/ 0
NOTE: PROCEDURE S3 used (Total process time):
real time 0.38 seconds
cpu time 0.04 seconds
88
…………
……..........
Thank you to Chuck Hunley and Merri Jensen for their assistance with this Viya 4 functionality.
Important Link:
Find more articles from SAS Global Enablement and Learning here.
Hi @UttamKumar
Very useful!
For this to work do we need to set the "hop limit" on the EKS managed nodes metadata to 2?
Thanks,
Eyal
@EyalGonen It depends. If viya-iac-aws is used to deploy the cluster, then it sets
These options block both IMDSv1 and IMDSv2 from being used with our S3 interface.
To use IMDSv2, the hop limit has to be set to 2. If you want to use IMDSv1, not recommended, then metadata_http_tokens should be set to optional.
https://docs.aws.amazon.com/cli/latest/reference/ec2/modify-instance-metadata-options.html
Hi @ChuckHunley
So if use viya4-iac-aws with the default settings, then the instructions in this post will work or not?
Thanks!
@EyalGonen Yes, these instructions will work without needing to increase the hop limit.
Wow, this is a game-changer! Thanks for sharing this detailed guide!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.