BookmarkSubscribeRSS Feed

SAS + DuckDB Series: What is DuckDB?

Started yesterday by
Modified yesterday by
Views 70

If you work with data, you’ve probably heard DuckDB mentioned here and there. It’s been getting a lot of attention lately, especially in analytics and data science circles. So before we talk about how SAS works with DuckDB, let’s just answer the basic question:

 

What is DuckDB, and why should we care?

 

 

NR_2025-12-06_05-58-07.pngThe "SQLite of Analytics"

 

DuckDB is often described as the SQLite of analytics, and that comparison makes a lot of sense.

 

Just as SQLite brought a lightweight, embedded SQL engine to application developers, DuckDB brings that same simplicity to analytical workloads.

 

It’s deliberately small, easy to install, and incredibly straightforward to use. There’s no server to configure, no cluster to deploy, and no background services to manage. DuckDB runs directly inside whatever environment you’re already working in—your Python session, your R script, your application, or even SAS.

 

 

NR_2025-12-06_05-58-07.pngBuilt for Analytical Work

 

DuckDB is designed specifically for analytical workloads, and a big part of that comes from its columnar storage format.

 

Instead of storing data row by row, DuckDB stores values column by column. For analytics, that’s a huge advantage.

 

Why? Because most analytical queries focus on only a handful of columns. When you're filtering, scanning, aggregating, or joining, you rarely need the entire row. With a columnar layout, DuckDB can read only the columns required for the query and completely skip the rest.

 

This reduces disk I/O, improves CPU efficiency, and often leads to dramatic speed-ups — even on a typical laptop with no special hardware.

 

Columnar storage also makes compression more effective and improves vectorized execution, both of which contribute to fast, predictable performance on analytical tasks.

 

In short: DuckDB is optimized for the kind of work analysts and data scientists actually do.

 

 

NR_2025-12-06_05-58-07.pngFast Execution

 

DuckDB’s speed doesn’t come from specialized hardware or distributed clusters—it comes from how it processes data.

 

Instead of handling one row at a time, DuckDB uses vectorized execution, which means it processes data in batches, often a few thousand rows at once.

 

This batch-based approach is a perfect match for modern CPUs, which are optimized for performing the same operation on multiple values in parallel. By feeding the CPU a steady stream of well-structured chunks, DuckDB keeps the processor busy and avoids the overhead of row-by-row interpretation.

 

The result is fast, efficient query execution—often surprisingly fast, even on a basic laptop with no tuning, no configuration, and no special hardware.

 

 

NR_2025-12-06_05-58-07.pngFlexible With Data

 

DuckDB is designed to be adaptable, and its default mode is completely in-memory. You can create tables, run transformations, and perform analytics without writing anything to disk. This is perfect for quick exploration, temporary workflows, or prototyping where speed matters more than persistence.

 

When you do want to persist data, DuckDB lets you store it in a compact, portable .duckdb database file. It reopens quickly, supports multiple tables, and keeps everything in one place for repeatable analysis.

 

And if you’d rather not store anything at all, DuckDB can query external files directly—Parquet, CSV, Delta, Iceberg, and more—without any loading or import steps. Just point DuckDB at the files and start running SQL, whether they’re on a local drive, in a data lake, or in cloud object storage.

 

 

NR_2025-12-06_05-58-07.pngRuns Where You Work

 

One of DuckDB’s biggest strengths is that it doesn’t require its own server or external service. It’s an embedded database, which means it runs directly inside whatever environment you’re already using.

 

You can load and use DuckDB from:

 

  • Python — per​fect​ ​for notebooks and data science workflows
  • R — ideal for tidyverse analysis or Shiny application
  • C++ — useful for integrating analytics directly into applications
  • The command line — great for quick SQL queries on local files
  • And yes… SAS — more on this in the next part of the series

 

Because it’s just a library, it doesn’t need to be deployed, configured, or monitored. There’s no background daemon, no ports to manage, and no cluster to maintain.

 

 

NR_2025-12-06_05-58-07.pngOpen and Cross-Platform

 

DuckDB is completely open source, which means anyone can read the code, contribute improvements, or build extensions. In addition to the official features, there’s a growing collection of community-built extensions you can install and use freely.

 

It runs on Windows, macOS, and Linux, so it works the same no matter where your analytics happen. Nothing locked down, nothing proprietary—just a flexible, open database you can use anywhere.

 

 

NR_2025-12-06_05-58-07.pngWhat’s Next

 

This post was just about getting familiar with DuckDB itself. In the next part of the series, we’ll look at how SAS connects to DuckDB and what that enables.

 

 

Thanks for reading.

 

 

Find more articles from SAS Global Enablement and Learning here.

Contributors
Version history
Last update:
yesterday
Updated by:

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started