Now that we have set the landscape with the first two posts describing what DuckDB is and how DuckDB runs inside SAS Viya, it’s time to go deeper and look at how to actually use it from SAS code.
As mentioned previously, the DuckDB engine can be used in multiple ways:
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
Although one of DuckDB’s main strengths is its ability to transparently query open file formats and table formats (Parquet, Delta Lake, Iceberg, etc.) without formally importing them—and we’ll cover that in a future post—it’s important to first understand that DuckDB is a SQL processing engine with its own native storage format and modern data types.
Understanding DuckDB’s native format is important because, from a SAS perspective, DuckDB native tables and external files accessed through DuckDB both appear as regular SAS tables. Knowing the difference helps avoid confusion and ensures you understand whether data is stored inside DuckDB itself or being accessed directly from external open file formats.
Like Parquet, DuckDB uses a columnar format designed for analytics. In most cases, only the columns referenced in a query are read, which significantly improves performance and reduces I/O.
DuckDB can operate entirely in memory, or it can persist data to disk using its native .duckdb database file format.
From a SAS perspective, using DuckDB in memory is extremely simple. You start by assigning a library using the DuckDB engine:
libname myduck duckdb ;
This creates an in-memory DuckDB database associated with your SAS Compute session.
You can then load data into DuckDB:
data myduck.yellow_tripdata ;
set yt.yellow_tripdata_2024_11 ;
run ;
At this point, the table exists inside DuckDB, not as a SAS data set. However, from a SAS user perspective, it behaves just like any other SAS library table.
With implicit SQL pass-through, SAS automatically pushes eligible operations down to DuckDB. This minimizes data movement and allows DuckDB to perform the heavy processing using its optimized columnar engine.
For example, to identify the most popular routes:
proc sql ;
create table myduck.trip_routes as
select PULocationID,
DOLocationID,
count(*) as trip_count
from myduck.yellow_tripdata
group by PULocationID, DOLocationID ;
quit ;
For maximum efficiency, you want the entire query to execute inside DuckDB.
You can confirm this using SAS SQL tracing:
options sastrace=",,,d" sastraceloc=saslog nostsuffix ;
Review the SAS log after running your query.
If you see the CREATE TABLE statement along with the SELECT portion of the query executed in DuckDB syntax, it means the entire query was executed inside DuckDB, with no data transferred between DuckDB and SAS.
These tracing options are extremely useful for understanding what SQL SAS generates and pushes to DuckDB—especially when working with external file formats, which we will cover later in this series.
DuckDB tables appear just like any other tables in SAS Studio:
This seamless integration makes DuckDB feel like a native part of the SAS environment.
However, when using the in-memory mode, the database exists only for the duration of the SAS Compute session (just like a table in the WORK library). If you clear all DuckDB in-memory libraries, or if your SAS session ends, all in-memory DuckDB tables are lost.
To persist data across sessions, you can store DuckDB data in a database file:
libname myduck duckdb database="/data/duckdb/myduck.duckdb" ;
Instead of storing tables in memory, DuckDB stores them in the specified .duckdb file.
This is useful when:
If multiple users need access to the same database file, you may want to open it in read-only (DuckDB option) mode to prevent conflicts or locking issues:
libname myduck duckdb database="/data/duckdb/myduck.duckdb" read_only=yes ;
DuckDB is a powerful, lightweight analytical SQL engine with a high-performance columnar storage format that integrates seamlessly into SAS Viya.
It can operate fully in memory for maximum speed or persist data on disk for reuse and sharing. Through implicit pass-through, SAS allows DuckDB to execute queries directly, minimizing data movement and maximizing performance.
Beyond its native format, DuckDB also enables direct querying of modern open file and table formats such as Parquet, Delta Lake, and Iceberg—capabilities we will explore in the next post.
These capabilities are covered in depth in the course Working with DuckDB in SAS Viya which explores DuckDB usage, performance, open file formats, and advanced integration scenarios.
Thanks for reading.
Find more articles from SAS Global Enablement and Learning here.
Is there actually an underscore in the READ_ONLY option you mentioned?
Why?
The LIBNAME statement has had a READONLY option (no underscore) for a gazillion years.
You’re referring to the ACCESS=READONLY option, not a READONLY option on the LIBNAME statement itself.
The READ_ONLY= option in the DuckDB LIBNAME engine is different. It maps directly to DuckDB’s native read-only database mode and ensures the .duckdb file is opened internally by DuckDB in read-only mode, avoiding write locks and enforcing read-only behavior at the database engine level.
This is the appropriate option when working with DuckDB databases from SAS.
But does it also support the normal ACCESS=READONLY option (obviously under the hood translating it to the syntax required by DuckDB)? Or are we forced to remember to use a different option for librefs defined using the DUCKDB engine than we are used to using with the normal SAS engine?
One question though, does duck db operate in a clustered environment?
No, ACCESS=READONLY won't work as expected in that situation. We recommend using READ_ONLY=YES.
Before I can answer accurately to this question, would you mind elaborating a little bit?
DuckDB is designed as an embedded, single-node OLAP database. It runs "in-process" in a SAS Compute Server session.
You can also check this article: https://communities.sas.com/t5/SAS-Communities-Library/SAS-DuckDB-Series-How-DuckDB-Runs-Inside-SAS-...
Thank you @NicolasRobert I had the same impression about duck db when I read the documentation from their site.
I was wondering whether we could add data from SAS to duck db to a report (power bi for instance that would be available for mutliple users).
Power BI has an experimental driver that supports only single instance connection. Don't know how the report engine serves the data to the report and multiple users can access the report.
To be honest I don't how the knowledge how this is supported but thank you for the article!
Why this convoluted way (SAS Viya -> DuckDB -> BI Report)?
Why Not directly (SAS Viya -> BI Report)?
If your data already in SAS Viya (CAS) distributed across multiple nodes, one would think, it would be as efficient to query SAS directly via ODBC/JDBC/OLE-DB and feed your BI Report without going through DuckDB!
As @NicolasRobert had mentioned, DuckDB is intentionally built to be a in-process/embedded single instance database, just like SqLite. If you are looking for distributed/clustered DuckDB infrastructure, then you need to look into MotherDuck.
MotherDuck is a Cloud Warehouse platform built on top of DuckDB.
Hope this helps
You are right. In my case I would like to ingest data from SAS 9.4M8 (we don't have Viya) to Power BI Report, since it is fast for OLAP (SAS had web reporting studio but in 9.4 marked that as obsolete).
The question is whether when the data are on the duck db side, still the users will be able to run the report.
Hi @vfarmak
Which deployment of SAS 9.4 M8 you currently have?
Hope this helps
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.