BookmarkSubscribeRSS Feed

Overview: Secure and Scalable Storage Access to Object Storage using DuckDB

Started yesterday by
Modified yesterday by
Views 77

As modern analytics platforms increasingly move toward cloud‑native data lakes, secure and seamless access to object storage has become critical. DuckDB ,used here as a SAS access engine, enables analysts and engineers to query data directly in cloud object storage with simplicity, performance, and security.

Built on top of the DuckDB query engine, this integration provides a standards‑based approach for accessing cloud data using native authentication, fine-grained access control, and high‑performance I/O.

This blog starts with DuckDB’s authentication and extension architecture, then explains how core DuckDB extensions enable secure access to AWS, Azure, Google Cloud object storage and S3 compatible object storage in On Premise.

 

DuckDB Extension Architecture (Core Concepts)

 

DuckDB’s functionality is expanded through loadable extensions, which can be installed and activated at runtime. The complete list of supported extensions is available in the DuckDB Extensions Overview.

 

From a storage‑access perspective, three extensions are foundational:

  • httpfs – Performs remote file I/O over HTTP(S)
  • aws – Handles AWS S3 authentication and credential discovery
  • azure – Handles Azure Blob Storage and ADLS Gen2 authentication

These extensions work together to provide a clean separation between authentication and data access.

 

DuckDB Authentication Model

Secrets‑Based Authentication

DuckDB supports authentication using secrets, which are the recommended and secure way to manage credentials. Older variable‑based authentication mechanisms are deprecated.

Secrets allow credentials to be:

  • Stored securely
  • Rotated independently
  • Referenced consistently across queries

Responsibility Split Between Extensions

DuckDB follows a clear separation of concerns:

  • Cloud‑specific extensions (aws, azure) handle:
  • Identity resolution
  • Credential discovery
  • Integration with native cloud IAM systems
  • httpfs extension handles:
  • Actual read/write I/O operations
  • File access semantics (globbing, streaming, parallel reads)

This separation ensures consistent behavior and portability across different environments and providers. The below table summarizes the overview,

AbhilashPA_0-1765803179659.png

DuckDB as the Foundation

Why DuckDB?

DuckDB is an open‑source analytical database optimized for OLAP workloads. This access engine builds on DuckDB to leverage:

  • Native support for open formats such as Parquet
  • High‑performance, vectorized execution
  • A rich and extensible extension framework

This makes DuckDB well suited for querying large datasets directly from object storage without data duplication.

Core Extensions for Cloud Object Storage

httpfs – Remote File I/O

The httpfs extension is the backbone of DuckDB’s object storage integration.🔗

Key capabilities:

  • HTTP(S)‑based file access
  • Streaming reads and writes
  • Underlying I/O engine used by AWS, Azure, and GCP integrations

All cloud object storage reads and writes ultimately pass through httpfs.

 

AWS S3 Authentication (aws Extension)

DuckDB integrates with Amazon S3 using the aws extension, built on top of  httpfs and the AWS SDK.

Supported Authentication Methods:

  • Access Key / Secret Key: Stored and referenced securely using DuckDB secrets.
  • AWS Credential Chain: Automatically resolved via:
    • Environment variables
    • AWS CLI configuration
    • IAM roles (for EC2 and EKS with IRSA)

The aws extension manages identity resolution and request signing, while httpfs carries out data I/O. This design integrates cleanly with AWS IAM and existing enterprise RBAC models.

 

Google Cloud Storage Authentication (httpfs)

DuckDB does not currently provide a dedicated GCP extension. Access to Google Cloud Storage (GCS) is enabled through the httpfs extension using standard Google authentication mechanisms.

Supported Authentication Methods

  • Service Account Key (JSON)
    Provided via the GOOGLE_APPLICATION_CREDENTIALS environment variable.
  • Application Default Credentials (ADC)
    Automatically resolved when running on:
    • Google Compute Engine (GCE)
    • Google Kubernetes Engine (GKE)
    • Environments authenticated via gcloud auth application-default login

Authentication is handled externally by Google’s SDKs and libraries, while httpfs performs the actual data access.

This aligns with Google Cloud IAM, enabling fine‑grained bucket‑ and object‑level permissions.

Azure Authentication (azure Extension)

The azure extension provides seamless connectivity to Azure Blob Storage and ADLS Gen2.

Installing and Loading the Extension

AbhilashPA_0-1765803472807.png

 

 

Supported Authentication Methods

  • Credential Chain
    Automatically resolves credentials via the Azure SDK, including:
    • Azure CLI login
    • Managed Identity
    • Environment variables
    • Default Azure credential chain
  • Access Token (OAuth 2.0) (Recommended)
    Explicit Azure AD bearer tokens supplied using DuckDB secrets.

OAuth Access Token Example

AbhilashPA_1-1765803472809.png

 

Azure’s hierarchical access model , spanning storage account, container, and file-level ACLs ,provides precise control and compliance alignment for enterprise use cases

OAuth‑based authentication enables:

  • Integration with Azure Active Directory (AAD)
  • Enforcement of Azure RBAC
  • POSIX‑compliant ACLs at container, directory, and file levels

Access Control in ADLS Gen2

Access can be enforced at multiple layers as shown below:

  • Storage Account Level – High‑level RBAC roles
  • Container Level – Scoped permissions per dataset
  • Directory and File Level – Fine‑grained POSIX ACLs

This layered model supports enterprise governance and compliance requirements . The diagram below illustrates the separation of permissions at the storage account level.

AbhilashPA_2-1765803472812.png

 

The below figure shows the overview of access control separation at container level.

AbhilashPA_3-1765803472815.png

On-Premises S3-Compatible Object Storage

In addition to public cloud providers, DuckDB can securely access on-premises and private-cloud S3-compatible object storage platforms such as MinIO, Red Hat Ceph (RGW), and other S3-API–compatible systems.

These platforms expose an S3-compatible API, allowing DuckDB to integrate using the same aws + httpfs extension model used for Amazon S3.

Commonly Supported Platforms

  • MinIO – High-performance, Kubernetes-native object storage
  • Red Hat Ceph Object Gateway (RGW)  – Enterprise-grade object storage
  • RedHat MultiCloud Gateway (Nooba)
  • Other S3-compatible systems – Including Dell ECS, NetApp StorageGRID, and similar platforms

Authentication Model

Authentication is handled using access key and secret key credentials, stored securely using DuckDB secrets.

Typical authentication characteristics:

  • Static access key / secret key pairs
  • Optional TLS (HTTPS) for secure transport
  • IAM-like policies implemented by the storage platform

Example secret definition:

AbhilashPA_0-1765803619733.png

 

Once configured, DuckDB can process data directly:

SELECT * FROM 's3://my-bucket/path/*.parquet';

Extension Responsibilities

  • The aws extension handles authentication and signing (SigV4)
  • The httpfs extension performs all read/write I/O operations

Enterprise and Hybrid Cloud Benefits

Supporting on-premises S3-compatible storage enables:

  • Hybrid cloud architectures with consistent access patterns
  • Data sovereignty and compliance by keeping data on-premises
  • Cloud-portable analytics without code changes
  • Seamless integration with Kubernetes-based object storage platforms

DuckDB’s reliance on open standards and S3 compatibility makes it a strong fit for enterprise environments that combine public cloud, private cloud, and on-premises infrastructure.

 

Summary

DuckDB combines the strengths of open data formats, cloud‑native IAM, and a modular extension architecture to deliver a modern, secure, and scalable approach to object storage access. By leveraging:

  • Secrets‑based authentication
  • Cloud‑native identity systems (IAM, Azure AD, GCP IAM)
  • Core extensions such as httpfs, aws, and azure

Organizations can:

  • Query data directly in cloud data lakes
  • Enforce fine‑grained security controls
  • Maintain auditability and compliance
  • Avoid vendor lock‑in through open formats

As cloud‑native analytics continues to evolve, DuckDB provides a strong foundation for secure and high‑performance access to enterprise data lakes.

 

Version history
Last update:
yesterday
Updated by:

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags