SAS recently released a new access engine called SAS Access to DuckDB, an in-process analytics engine. DuckDB supports a wide range of open file formats, making it highly convenient for querying structured data directly. It can seamlessly access data stored in object storage systems such as Amazon S3, Google Cloud Storage (GCS), and Azure ADLS.
In today’s world, where data breaches frequently make headlines and privacy regulations are becoming increasingly stringent worldwide, safeguarding sensitive information is more important than ever. Whether it involves personal records, financial transactions, or corporate intellectual property, data remains one of your most valuable and vulnerable assets.
The Two Pillars of Cloud Data Protection
As organizations move their infrastructure and data to the cloud, keeping that data secure becomes a top priority. Two key pillars of cloud data protection are:
Together, these mechanisms ensure that even if unauthorized access occurs, the data remains unreadable and secure.
The following figure provides a snapshot of the various states data can exist in.
Why Encryption Matters
Data encryption is a fundamental pillar of today’s security strategies. It guarantees that even if unauthorized individuals gain access to your systems or networks, they cannot read or misuse your data. From mobile applications to cloud platforms, encryption plays a crucial role in preserving confidentiality, integrity, and trust. In this blog, we’ll explore how data encryption is implemented with SASIODUCK, highlighting essential protections for businesses of all sizes, especially within cloud-native architectures.
In this blog, we’ll explore how DUCKDB enables encryption in cloud-native architectures, and how it can integrate seamlessly with AWS Key Management Service (KMS) for secure key handling in Viya4 environments.
Understanding Encryption with DUCKDB
Now let’s see how encryption works with the new SASIODUCK engine. In DuckDB, PRAGMA statements are special SQL commands (borrowed from SQLite) that let you tweak the database’s internal settings. One of these, PRAGMA add_parquet_key, is super useful when dealing with encrypted Parquet files. It lets you register an encryption key by name, so you can easily use it when reading or writing encrypted data. Think of it as telling DuckDB, ‘Hey, here’s the key you’ll need when handling secure files’,and it keeps that key handy for the rest of your session.
Let’s walk through a sample code snippet to see how this PRAGMA actually works in practice. It’ll show us how to encrypt a dataset and also how to work with already encrypted files—simple, straightforward, and secure.
We’ll kick things off by setting up a libname along with the source and target datasets—just laying the groundwork before we dive into the fun stuff."
Let’s walk through a quick example to see how we can set up an encryption key for working with encrypted Parquet files using SASIODUCK. For simplicity, we’re hardcoding the key here — but in a real-world setup, you’d pull it securely from an external key management system and inject it dynamically.
Now let’s convert a CSV file into an encrypted Parquet file. In this step, we’re using the COPY command along with the encryption config to make sure the output file is securely written with our key (key256).
Finally, let’s read the encrypted Parquet file back in. We’re using the read_parquet function along with the same encryption key (key256) to decrypt and query the data. For this example, we’re hardcoding the key and just previewing the first 10 rows — but in a real-world production setup, you'd typically fetch the key securely from an external Key Management Service like AWS KMS, Google Cloud KMS, or Azure Key Vault before plugging it into your workflow.
Curious how to plug in your Key Management System(KMS) securely? Let’s dive in! In this section, we’ll walk through the step-by-step process of integrating SASIODUK with AWS Key Management Service (KMS).
In this post, we’ll walk through how to integrate AWS Key Management Service (KMS) with Amazon EKS to enable secure, efficient, and streamlined key management for your cloud-native workloads. This setup ensures that DuckDB can access encryption keys without introducing operational complexity.
The integration unfolds in two main parts:
Let’s dive in!
Configuring Integration between EKS and KMS
In this section, we’ll walk through the step-by-step process of integrating AWS KMS with EKS, allowing Viya 4 to securely and seamlessly retrieve encryption keys as needed. The high level flow is shown below,
Start by creating a Customer Managed Key (CMK) in AWS KMS as shown below,
This key (abp-sasioduck) will be used to encrypt and decrypt. Make sure to assign proper permissions (Key Policy) to allow access from your EKS nodes or IAM roles.
Save the below JSON as kms-access-policy.json,
Then create the policy,
Using a SAS program to access AES key securely from KMS
Now that your EKS cluster is securely integrated with AWS KMS, the next step is to enable your application to use KMS for generating and managing encryption keys. This is where the AWS SDKs come into play.
Whether you're writing your app in Python, Node.js, or another language, AWS provides SDKs (like boto3 or aws-sdk) that allow your application to interact directly with KMS. By calling KMS APIs such as GenerateDataKey and Decrypt, your application can securely request encryption keys, use them for encrypting sensitive data, and decrypt it when needed—all without ever storing plaintext keys.
This setup ensures robust encryption while offloading the complexity of key management to AWS. Let’s explore how this works in code.
Step 1: Generate AES-256 Key using AWS KMS
In this step, we use Python within a SAS proc python block to securely generate a 256-bit AES key using AWS Key Management Service (KMS). The key is retrieved in plaintext form and will later be base64-encoded for compatibility with tools like DuckDB. This approach ensures strong encryption while keeping key management centralized and auditable through AWS.
Step2: Extract and encode the key in Base64 and return to SAS
After generating the AES key, we encode it in Base64 to make it compatible with DuckDB that expect keys in this format. The encoded key is then passed from Python back into the SAS environment using SAS.symput, making it available for use in subsequent SAS steps like encryption or secure data access
Step3 : Define a SAS libname and quote the key
In this step, we define a SAS library reference and specify the input CSV file and output encrypted Parquet file paths. We also prepare the Base64-encoded AES key by quoting it, ensuring it’s ready for use with DuckDB's encryption settings.
Step4: Register the encryption key with duckdb
In this step, we tell DuckDB about the AES-256 encryption key that will be used to encrypt or decrypt Parquet files. This is done using the DuckDB-specific PRAGMA add_parquet_key command, where:
'key256' is a label (or alias) for the encryption key.
"ed_key is the Base64-encoded key (quoted properly for SQL usage).
Once registered, DuckDB can use this key whenever it reads or writes encrypted Parquet files, making encryption seamless in your workflows.
Step5: Convert the csv file to encrypted parquet
In this step, we use DuckDB's SQL interface to:
Read data from a CSV file using read_csv_auto("&inputcsv"), which automatically infers the schema.
Write that data to a Parquet file ("&enc_data"), but with encryption enabled.
The ENCRYPTION_CONFIG option is used to encrypt the footer of the Parquet file using the registered key 'key256'. This ensures the data is stored securely, while still allowing tools that support encrypted Parquet to read it with the correct key.
Step6: Read the encrypted parquet files
In this step,
And that’s a wrap! By integrating AWS KMS with EKS and DuckDB, you enable secure, scalable, and auditable encryption workflows. This setup ensures that your sensitive data stays protected, while still being easy to manage and access within Viya4 environments.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.