SAS + DuckDB Series: Using the DuckDB Engine

3 Likes

Now that we have set the landscape with the first two posts describing what DuckDB is and how DuckDB runs inside SAS Viya, it’s time to go deeper and look at how to actually use it from SAS code.

As mentioned previously, the DuckDB engine can be used in multiple ways:

Fast in-memory analytics within a session
Persistent file-based DuckDB databases stored as .duckdb files
Direct querying of external data files such as CSV, Parquet, JSON, Delta Lake, and Iceberg

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

Although one of DuckDB’s main strengths is its ability to transparently query open file formats and table formats (Parquet, Delta Lake, Iceberg, etc.) without formally importing them—and we’ll cover that in a future post—it’s important to first understand that DuckDB is a SQL processing engine with its own native storage format and modern data types.

Understanding DuckDB’s native format is important because, from a SAS perspective, DuckDB native tables and external files accessed through DuckDB both appear as regular SAS tables. Knowing the difference helps avoid confusion and ensures you understand whether data is stored inside DuckDB itself or being accessed directly from external open file formats.

Like Parquet, DuckDB uses a columnar format designed for analytics. In most cases, only the columns referenced in a query are read, which significantly improves performance and reduces I/O.

DuckDB can operate entirely in memory, or it can persist data to disk using its native .duckdb database file format.

Using DuckDB in Memory from SAS

From a SAS perspective, using DuckDB in memory is extremely simple. You start by assigning a library using the DuckDB engine:

libname myduck duckdb ;

This creates an in-memory DuckDB database associated with your SAS Compute session.

You can then load data into DuckDB:

data myduck.yellow_tripdata ;
   set yt.yellow_tripdata_2024_11 ;
run ;

At this point, the table exists inside DuckDB, not as a SAS data set. However, from a SAS user perspective, it behaves just like any other SAS library table.

With implicit SQL pass-through, SAS automatically pushes eligible operations down to DuckDB. This minimizes data movement and allows DuckDB to perform the heavy processing using its optimized columnar engine.

For example, to identify the most popular routes:

proc sql ;
   create table myduck.trip_routes as
   select PULocationID,
          DOLocationID,
          count(*) as trip_count
   from myduck.yellow_tripdata
   group by PULocationID, DOLocationID ;
quit ;

For maximum efficiency, you want the entire query to execute inside DuckDB.

You can confirm this using SAS SQL tracing:

options sastrace=",,,d" sastraceloc=saslog nostsuffix ;

Review the SAS log after running your query.

If you see the CREATE TABLE statement along with the SELECT portion of the query executed in DuckDB syntax, it means the entire query was executed inside DuckDB, with no data transferred between DuckDB and SAS.

These tracing options are extremely useful for understanding what SQL SAS generates and pushes to DuckDB—especially when working with external file formats, which we will cover later in this series.

Working with DuckDB Tables in SAS Viya

DuckDB tables appear just like any other tables in SAS Studio:

They are visible in the Libraries pane

They can be explored interactively
They can be used in SAS Studio Flows and Steps

They can be queried using DATA step, PROC SQL, or other SAS procedures

This seamless integration makes DuckDB feel like a native part of the SAS environment.

However, when using the in-memory mode, the database exists only for the duration of the SAS Compute session (just like a table in the WORK library). If you clear all DuckDB in-memory libraries, or if your SAS session ends, all in-memory DuckDB tables are lost.

Persisting DuckDB Data on Disk

To persist data across sessions, you can store DuckDB data in a database file:

libname myduck duckdb database="/data/duckdb/myduck.duckdb" ;

Instead of storing tables in memory, DuckDB stores them in the specified .duckdb file.

This is useful when:

Working with datasets larger than available memory
Reusing prepared data across sessions
Sharing prepared data with colleagues
Building reusable analytics pipelines

If multiple users need access to the same database file, you may want to open it in read-only (DuckDB option) mode to prevent conflicts or locking issues:

libname myduck duckdb database="/data/duckdb/myduck.duckdb" read_only=yes ;

Summary

DuckDB is a powerful, lightweight analytical SQL engine with a high-performance columnar storage format that integrates seamlessly into SAS Viya.

It can operate fully in memory for maximum speed or persist data on disk for reuse and sharing. Through implicit pass-through, SAS allows DuckDB to execute queries directly, minimizing data movement and maximizing performance.

Beyond its native format, DuckDB also enables direct querying of modern open file and table formats such as Parquet, Delta Lake, and Iceberg—capabilities we will explore in the next post.

Learn More

These capabilities are covered in depth in the course Working with DuckDB in SAS Viya which explores DuckDB usage, performance, open file formats, and advanced integration scenarios.

Thanks for reading.

Find more articles from SAS Global Enablement and Learning here.

Tom · ‎02-25-2026

Is there actually an underscore in the READ_ONLY option you mentioned?

Why?

The LIBNAME statement has had a READONLY option (no underscore) for a gazillion years.

NicolasRobert · ‎02-26-2026

You’re referring to the ACCESS=READONLY option, not a READONLY option on the LIBNAME statement itself.

The READ_ONLY= option in the DuckDB LIBNAME engine is different. It maps directly to DuckDB’s native read-only database mode and ensures the .duckdb file is opened internally by DuckDB in read-only mode, avoiding write locks and enforcing read-only behavior at the database engine level.

This is the appropriate option when working with DuckDB databases from SAS.

Tom · ‎02-26-2026

But does it also support the normal ACCESS=READONLY option (obviously under the hood translating it to the syntax required by DuckDB)? Or are we forced to remember to use a different option for librefs defined using the DUCKDB engine than we are used to using with the normal SAS engine?

vfarmak · ‎02-26-2026

One question though, does duck db operate in a clustered environment?

NicolasRobert · ‎02-27-2026

@Tom

No, ACCESS=READONLY won't work as expected in that situation. We recommend using READ_ONLY=YES.

NicolasRobert · ‎02-27-2026

@vfarmak

Before I can answer accurately to this question, would you mind elaborating a little bit?

DuckDB is designed as an embedded, single-node OLAP database. It runs "in-process" in a SAS Compute Server session.

You can also check this article: https://communities.sas.com/t5/SAS-Communities-Library/SAS-DuckDB-Series-How-DuckDB-Runs-Inside-SAS-...

vfarmak · ‎03-03-2026

Thank you @NicolasRobert I had the same impression about duck db when I read the documentation from their site.

I was wondering whether we could add data from SAS to duck db to a report (power bi for instance that would be available for mutliple users).

Power BI has an experimental driver that supports only single instance connection. Don't know how the report engine serves the data to the report and multiple users can access the report.

To be honest I don't how the knowledge how this is supported but thank you for the article!

ahmedalattar · ‎03-04-2026

@vfarmak

Why this convoluted way (SAS Viya -> DuckDB -> BI Report)?
Why Not directly (SAS Viya -> BI Report)?

If your data already in SAS Viya (CAS) distributed across multiple nodes, one would think, it would be as efficient to query SAS directly via ODBC/JDBC/OLE-DB and feed your BI Report without going through DuckDB!

As @NicolasRobert had mentioned, DuckDB is intentionally built to be a in-process/embedded single instance database, just like SqLite. If you are looking for distributed/clustered DuckDB infrastructure, then you need to look into MotherDuck.

MotherDuck is a Cloud Warehouse platform built on top of DuckDB.

Hope this helps

vfarmak · ‎03-23-2026

@ahmedalattar

You are right. In my case I would like to ingest data from SAS 9.4M8 (we don't have Viya) to Power BI Report, since it is fast for OLAP (SAS had web reporting studio but in 9.4 marked that as obsolete).

The question is whether when the data are on the duck db side, still the users will be able to run the report.

ahmedalattar · ‎03-23-2026

Hi @vfarmak

Which deployment of SAS 9.4 M8 you currently have?

SAS Foundation (BASE SAS+)
- You can use SAS ODBC/JDBC Driver(s) to provide the contents of *.sas7bdat (SAS Data Sets) to third party software
SAS BI/EBI --- While the BI Portal & BI Dashboard have been discontinued,
- Web Report Studio, and Stored Process are still viable reporting options
- SAS Providers for OLE DB (About the SAS Providers for OLE DB - 9.3) is another option depending on your licensed products
- SAS R&D have created an alternative/replacement Web Application to the deprecated SAS Portal. Get in touch with your Customer Success rep or SAS Tech Support for more information and support.

Hope this helps