BookmarkSubscribeRSS Feed

Integrating SAS Access to DuckDB with AWS KMS for Secure Data Encryption in Viya4

Started 4 weeks ago by
Modified 4 weeks ago by
Views 298

SAS recently released a new access engine called SAS Access to DuckDB, an in-process analytics engine. DuckDB supports a wide range of open file formats, making it highly convenient for querying structured data directly. It can seamlessly access data stored in object storage systems such as Amazon S3, Google Cloud Storage (GCS), and Azure ADLS.

In today’s world, where data breaches frequently make headlines and privacy regulations are becoming increasingly stringent worldwide, safeguarding sensitive information is more important than ever. Whether it involves personal records, financial transactions, or corporate intellectual property, data remains one of your most valuable and vulnerable assets.

The Two Pillars of Cloud Data Protection

As organizations move their infrastructure and data to the cloud, keeping that data secure becomes a top priority. Two key pillars of cloud data protection are:

  • Encryption at Rest: Protects data stored on disks or other persistent storage.
  • Encryption in Transit: Secures data as it travels between systems, services, or users.

Together, these mechanisms ensure that even if unauthorized access occurs, the data remains unreadable and secure.

The following figure provides a snapshot of the various states data can exist in.

AbhilashPA_16-1762252309079.png

 

Why Encryption Matters

Data encryption is a fundamental pillar of today’s security strategies. It guarantees that even if unauthorized individuals gain access to your systems or networks, they cannot read or misuse your data. From mobile applications to cloud platforms, encryption plays a crucial role in preserving confidentiality, integrity, and trust. In this blog, we’ll explore how data encryption is implemented with SASIODUCK, highlighting essential protections for businesses of all sizes, especially within cloud-native architectures.

In this blog, we’ll explore how DUCKDB enables encryption in cloud-native architectures, and how it can integrate seamlessly with AWS Key Management Service (KMS) for secure key handling in Viya4 environments.

Understanding Encryption with DUCKDB

Now let’s see how encryption works with the new SASIODUCK engine. In DuckDB, PRAGMA statements are special SQL commands (borrowed from SQLite) that let you tweak the database’s internal settings. One of these, PRAGMA add_parquet_key, is super useful when dealing with encrypted Parquet files. It lets you register an encryption key by name, so you can easily use it when reading or writing encrypted data. Think of it as telling DuckDB, ‘Hey, here’s the key you’ll need when handling secure files’,and it keeps that key handy for the rest of your session.

Let’s walk through a sample code snippet to see how this PRAGMA actually works in practice. It’ll show us how to encrypt a dataset and also how to work with already encrypted files—simple, straightforward, and secure.

We’ll kick things off by setting up a libname along with the source and target datasets—just laying the groundwork before we dive into the fun stuff."

AbhilashPA_17-1762252309082.png

 

 

Let’s walk through a quick example to see how we can set up an encryption key for working with encrypted Parquet files using SASIODUCK. For simplicity, we’re hardcoding the key here — but in a real-world setup, you’d pull it securely from an external key management system and inject it dynamically.

AbhilashPA_18-1762252309085.png

 

Now let’s convert a CSV file into an encrypted Parquet file. In this step, we’re using the COPY command along with the encryption config to make sure the output file is securely written with our key (key256).

AbhilashPA_19-1762252309087.png

 

Finally, let’s read the encrypted Parquet file back in. We’re using the read_parquet function along with the same encryption key (key256) to decrypt and query the data. For this example, we’re hardcoding the key and just previewing the first 10 rows — but in a real-world production setup, you'd typically fetch the key securely from an external Key Management Service like AWS KMS, Google Cloud KMS, or Azure Key Vault before plugging it into your workflow.

AbhilashPA_20-1762252309089.png

 

 

Curious how to plug in your Key Management System(KMS) securely? Let’s dive in! In this section, we’ll walk through the step-by-step process of integrating SASIODUK with AWS Key Management Service (KMS).

In this post, we’ll walk through how to integrate AWS Key Management Service (KMS) with Amazon EKS to enable secure, efficient, and streamlined key management for your cloud-native workloads. This setup ensures that DuckDB can access encryption keys without introducing operational complexity.

The integration unfolds in two main parts:

  1. Configuring Integration between EKS and KMS
  2. Using a SAS program to retrieve the AES key securely from KMS and apply it for encrypting and decrypting datasets within SAS Viya 4.

Let’s dive in!

Configuring Integration between EKS and KMS

 In this section, we’ll walk through the step-by-step process of integrating AWS KMS with EKS, allowing Viya 4 to securely and seamlessly retrieve encryption keys as needed. The high level flow is shown below,

AbhilashPA_21-1762252309100.png

 

  • Create a KMS Key

 Start by creating a Customer Managed Key (CMK) in AWS KMS as shown below,

AbhilashPA_22-1762252309105.png

 

This key (abp-sasioduck) will be used to encrypt and decrypt. Make sure to assign proper permissions (Key Policy) to allow access from your EKS nodes or IAM roles.

 

  • Enable IAM Roles for Service Accounts (IRSA): IRSA allows Kubernetes pods to assume IAM roles securely. Ensure that ,
  • Enable OIDC provider for your EKS cluster : If not already configured, associate an OpenID Connect (OIDC) identity provider with your EKS cluster. This allows AWS to authenticate Kubernetes service accounts.

 

AbhilashPA_23-1762252309107.png

 

 

  • Create an IAM Policy for KMS Access: Create an IAM policy that grants access to your KMS key (e.g., kms:Encrypt, kms:Decrypt).

 

Save the below JSON as kms-access-policy.json,

AbhilashPA_24-1762252309109.png

 

 

Then create the policy,

AbhilashPA_25-1762252309110.png

 

 

  • Create an IAM role with this policy and trust relationship for your Kubernetes service account.

 

  • Annotate your service account with the IAM role ARN. As we are accessing the keys from SAS Studio, we need to annotate the service account (sas-programming-environment) for “sas-compute” sessions.

 

Using a SAS program to access AES key securely from KMS

Now that your EKS cluster is securely integrated with AWS KMS, the next step is to enable your application to use KMS for generating and managing encryption keys. This is where the AWS SDKs come into play.

Whether you're writing your app in Python, Node.js, or another language, AWS provides SDKs (like boto3 or aws-sdk) that allow your application to interact directly with KMS. By calling KMS APIs such as GenerateDataKey and Decrypt, your application can securely request encryption keys, use them for encrypting sensitive data, and decrypt it when needed—all without ever storing plaintext keys.

This setup ensures robust encryption while offloading the complexity of key management to AWS. Let’s explore how this works in code.

Step 1: Generate AES-256 Key using AWS KMS

In this step, we use Python within a SAS proc python block to securely generate a 256-bit AES key using AWS Key Management Service (KMS). The key is retrieved in plaintext form and will later be base64-encoded for compatibility with tools like DuckDB. This approach ensures strong encryption while keeping key management centralized and auditable through AWS.

AbhilashPA_26-1762252309113.png

 

Step2: Extract and encode the key in Base64 and return to SAS

After generating the AES key, we encode it in Base64 to make it compatible with DuckDB that expect keys in this format. The encoded key is then passed from Python back into the SAS environment using SAS.symput, making it available for use in subsequent SAS steps like encryption or secure data access

AbhilashPA_27-1762252309116.png

 

Step3 : Define a SAS libname and quote the key

In this step, we define a SAS library reference and specify the input CSV file and output encrypted Parquet file paths. We also prepare the Base64-encoded AES key by quoting it, ensuring it’s ready for use with DuckDB's encryption settings.

AbhilashPA_28-1762252309119.png

 

Step4: Register the encryption key with duckdb

In this step, we tell DuckDB about the AES-256 encryption key that will be used to encrypt or decrypt Parquet files. This is done using the DuckDB-specific PRAGMA add_parquet_key command, where:

'key256' is a label (or alias) for the encryption key.

&quoted_key is the Base64-encoded key (quoted properly for SQL usage).

Once registered, DuckDB can use this key whenever it reads or writes encrypted Parquet files, making encryption seamless in your workflows.

AbhilashPA_29-1762252309121.png

 

 

 

Step5: Convert the csv file to encrypted parquet

In this step, we use DuckDB's SQL interface to:

Read data from a CSV file using read_csv_auto("&inputcsv"), which automatically infers the schema.

Write that data to a Parquet file ("&enc_data"), but with encryption enabled.

The ENCRYPTION_CONFIG option is used to encrypt the footer of the Parquet file using the registered key 'key256'. This ensures the data is stored securely, while still allowing tools that support encrypted Parquet to read it with the correct key.

AbhilashPA_30-1762252309123.png

 

Step6: Read the encrypted parquet files

In this step,

  • Registers the decryption key (key256) with DuckDB using PRAGMA add_parquet_key.
  • Reads and decrypts the encrypted Parquet file using read_parquet() with the registered key.

AbhilashPA_31-1762252309125.png

 

And that’s a wrap! By integrating AWS KMS with EKS and DuckDB, you enable secure, scalable, and auditable encryption workflows. This setup ensures that your sensitive data stays protected, while still being easy to manage and access within Viya4 environments.

Contributors
Version history
Last update:
4 weeks ago
Updated by:

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Labels
Article Tags