BookmarkSubscribeRSS Feed

Integrating SAS Access to DuckDB Running on EKS with Amazon S3 Using IAM Roles

Started ‎10-14-2025 by
Modified ‎10-14-2025 by
Views 528

With the growing adoption of cloud-native analytics, organizations are looking for solutions that combine performance, simplicity, and security. SAS recently released SAS Access to DuckDB, an access engine that connects to DuckDB, a high-performance, in-process analytics engine. DuckDB supports open file formats like Parquet, CSV, and JSON and can natively connect to object stores such as Amazon S3, Google Cloud Storage (GCS), and Azure Data Lake Storage (ADLS).

This makes it possible to run ad-hoc analytics on cloud-native datasets without standing up heavy database infrastructure. However, when deploying in production, especially on Amazon EKS (Elastic Kubernetes Service) ,security and governance become crucial.

In this post, we’ll walk through how to integrate SAS viya on EKS with Amazon S3 using IAM roles for service accounts (IRSA), ensuring secure, fine-grained access control without hardcoding AWS credentials.

Why EKS + DuckDB + S3?

  • Viya on EKS for scalability : containerized workloads with autoscaling.
  • DuckDB for analytics: fast in-process queries without needing a database server.
  • Amazon S3 for storage:  scalable, durable data lake for structured and semi-structured data.
  • IAM-based access: no need to store long-lived credentials; permissions are scoped to pods.

This combination enables secure, cloud-native analytics workflows, where data remains in S3 while compute scales dynamically in EKS. The integration can be configured in two stages, IAM integration between EKS and S3  and  incorporating the necessary secrets into SAS code. Let’s walk through the detailed steps below.

  1. Setting Up IAM Integration Between EKS and S3

In this stage, we configure the EKS cluster to securely access S3 using IAM roles for service accounts (IRSA). This eliminates the need for hard-coded credentials and ensures that access is scoped according to least-privilege principles. We will cover:

  • Enabling the OIDC provider for the EKS cluster
  • Creating an IAM policy with S3 access
  • Mapping the policy to a Kubernetes service account
  • Annotating pods to assume the IAM role

These steps allow your DuckDB workloads running in EKS to seamlessly and securely read/write data in S3.

  • Enable OIDC Provider for EKS

Amazon EKS integrates with IAM Roles for Service Accounts (IRSA) by relying on an OpenID Connect (OIDC) identity provider. The OIDC provider enables Kubernetes service accounts in your cluster to be linked with IAM roles. This is a critical step because it allows workloads (pods) to securely obtain temporary AWS credentials, instead of relying on long-lived static access keys.

By enabling the OIDC provider, EKS creates a trust relationship between the Kubernetes service accounts in your cluster and AWS IAM, so that pods can request tokens and assume IAM roles as needed.

If your EKS cluster doesn’t already have an OIDC provider configured, you can create and associate one with the following command:

AbhilashPA_0-1760433812950.png

 

 

This command:

Queries your EKS cluster for its OIDC issuer URL.

Creates an IAM OIDC identity provider in your AWS account (if it does not exist).

Associates that OIDC provider with your cluster.

Uses the --approve flag to skip interactive approval and automatically confirm the association.

Once this step is completed, you can safely use IRSA to map Kubernetes service accounts to IAM roles, allowing fine-grained access control for different workloads in the cluster.

To confirm that the OIDC provider has been successfully associated with your cluster, execute the below command,

AbhilashPA_1-1760433812952.png

 

Create an IAM Policy for S3 Access

The first step in enabling secure access between your EKS workloads and Amazon S3 is to define a dedicated IAM Policy. This policy ensures that your Kubernetes pods via their service accounts only get the minimum permissions required to interact with S3.

Why this step is important

  • Granular control: IAM policies let you restrict access to only specific S3 buckets and actions.
  • Security best practice: Following the principle of least privilege prevents accidental or malicious modification of other S3 resources in your AWS account.
  • Auditability: Policies are versioned and managed centrally, so you can track changes and attach them only where needed.

In many analytics use cases, your workloads need to both read and write data in Amazon S3. For example, you may read existing datasets, write query results, or update files as part of your pipeline. Below is a sample policy for a bucket named my-duckdb-data-bucket that grants read, write, list, and update permissions.

Create a JSON file (s3-duckdb-policy.json) with the following content:

AbhilashPA_2-1760433812956.png

 

Note: This IAM policy is only a sample. It should be customized based on the customer’s security model and compliance requirements. For example, you may:

  • Restrict access to a specific prefix within the bucket (e.g., arn:aws:s3:::my-duckdb-data-bucket/finance/*) instead of the entire bucket.
  • Limit actions to read-only (s3:GetObject, s3:ListBucket) if workloads don’t require writing or deleting data.
  • Apply resource-based conditions using bucket policies (e.g., restrict access to a specific VPC, IP range, or require TLS.
  • Add logging and monitoring policies (via CloudTrail and S3 access logs) for compliance visibility.

 

  • Registering the Policy

Once you have defined your policy JSON file (s3-duckdb-policy.json), you can create the policy in your AWS account either using the AWS CLI or through the AWS Management Console.

AbhilashPA_3-1760433812958.png

 

 

  • Attach the Policy to an IAM Role (IRSA)

SAS Viya uses the existing service account “sas-programming-environment” for starting compute pods. To allow these pods to securely access S3 without embedding AWS credentials we leverage IAM Roles for Service Accounts (IRSA), which allows Kubernetes pods to assume an AWS IAM role automatically via the EKS cluster’s OIDC provider. This approach provides secure, temporary, and auditable access to S3 for SAS workloads without embedding credentials. The process consists of three main steps:

 

  1. Create an IAM Role for the Service Account

Create an IAM role that can be assumed by the sas-programming-environment service account. This role’s trust policy specifies the OIDC provider of your EKS cluster and the namespace/service account combination. Replace placeholders with your values:

This can be done either using the AWS CLI or through the AWS Management Console:

AbhilashPA_4-1760433812962.png

 

 

Notes:

  • <account-id>: Your AWS account ID
  • <region> : AWS region where the EKS cluster is deployed
  • <eks-cluster-id> : Your EKS cluster OIDC ID
  • <namespace> : Kubernetes namespace where SAS compute pods run
  1. Attach the IAM Policy to the Role

Attach the S3 access policy you created earlier (S3DuckDBAccess) to the IAM role.
This step ensures that the SAS pods running in your EKS cluster can use the role to interact with Amazon S3. The policy controls which specific operations (such as reading objects, writing new ones, listing buckets, or updating content) are permitted.

AbhilashPA_5-1760433812964.png

 

At this point, the SASDuckDBAccessRole is associated with the S3DuckDBAccess policy. This means the role now has the necessary S3 permissions defined in the policy. However, the permissions are not yet available to pods in EKS until the role is linked to a Kubernetes service account using IRSA (through service account annotation).

 

  1. Annotate the Existing Service Account

The final step in enabling IAM Roles for Service Accounts (IRSA) is to annotate your Kubernetes service account with the IAM role ARN. This annotation creates the link between the service account that your SAS pods use and the IAM role you created earlier. Once this mapping is in place, any pod running under that service account will automatically inherit the S3 permissions defined in the S3DuckDBAccess policy—without the need for static AWS credentials.

Use the following command:

AbhilashPA_6-1760433812965.png

 

You should see the IAM role ARN under the annotations section.

SAS compute pods using the sas-programming-environment service account can now securely access S3 via IAM, with no AWS access keys or secrets stored in code. This enables safe and auditable integration with DuckDB or other S3-based workload.

 

 

  1. Accessing S3 from SAS

Once the IAM Role for Service Account (IRSA) is configured, SAS compute pods can seamlessly authenticate to AWS S3.The following example shows how to connect to  Amazon S3 from SAS studio and query Parquet data directly. Let’s break down the code step by step.

  • Define a Library  & Start a SQL session
AbhilashPA_7-1760433812966.png

 

  • Install and Load Required Extensions

 

AbhilashPA_8-1760433812968.png

 

 

In this step, we enable DuckDB to interact with cloud storage by installing the necessary extensions.

httpfs : Adds support for accessing remote files over HTTP and Amazon S3.

aws  : Provides native integration with AWS services for authentication and secure access.

We also configure the extension_directory, which defines where DuckDB stores the downloaded extension binaries inside the pod. This ensures the extensions are kept in a consistent location and don’t need to be re-installed each time.

 

 

  • Configure Authentication with S3

To securely connect DuckDB to Amazon S3, we define a secret. This secret tells DuckDB how to authenticate when accessing S3 resources.

 

AbhilashPA_9-1760433812969.png

 

 

  • TYPE s3 : Indicates that the secret is for accessing Amazon S3.
  • PROVIDER credential chain : Instructs DuckDB to use the AWS default credential chain (environment variables, IAM roles, IRSA, or SSO).
  • CHAIN sts : Specifies that DuckDB should obtain temporary credentials using the AWS Security Token Service (STS).
  • REGION 'us-east-1' : Sets the AWS region for the connection.

 

  • Query Data from S3

Finally, we query a Parquet file directly from S3

AbhilashPA_10-1760433812971.png

 

Finally, we query a Parquet file directly from S3. Because DuckDB uses the IRSA-based IAM credentials of the SAS compute pod, it can fetch the file securely from S3 without any manual key management

 

Conclusion

By integrating SAS Viya, DuckDB, and AWS S3 through IAM Roles for Service Accounts (IRSA), we’ve created a secure, scalable, and efficient way to access cloud data directly from SAS Studio.No hardcoded credentials are required, authentication is handled by AWS IAM.

DuckDB extensions provide fast access to structured data formats such as Parquet and CSV stored in S3.SAS Studio users can now seamlessly query and analyze cloud-hosted datasets with familiar SQL and SAS procedures.This approach not only strengthens security but also simplifies operations, enabling organizations to unlock the full potential of cloud-native analytics in SAS Viya

Contributors
Version history
Last update:
‎10-14-2025 05:36 AM
Updated by:

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Labels
Article Tags