DuckDB Paimon Extension 🦆

This extension enables DuckDB to read and query Apache Paimon format data directly — no ETL pipelines, no Flink/Spark clusters required. Just open a DuckDB shell and run SQL against your Paimon tables.

Similar to other extension, duckdb-paimon brings DuckDB's powerful local analytics to the Paimon data lake ecosystem.

About Apache Paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations. It innovatively combines lake format and LSM structure, bringing realtime streaming updates into the lake architecture.

Implementation

This extension is built on top of paimon-cpp, an open-source C++ library that provides native access to Paimon format data. It is the first library that brings native Paimon read/write capabilities to the C++ ecosystem.

Technical Highlights

Zero JVM dependency — No Java runtime required. Pure C++ implementation means minimal memory footprint and instant startup.
Apache Arrow data exchange — Data flows between paimon-cpp and DuckDB via Apache Arrow, the industry standard for columnar in-memory data, enabling zero-copy transfers with no serialization overhead.
Parallel scan architecture — Paimon tables are split into independent Splits, and DuckDB's multi-threaded execution engine reads them in parallel to fully utilize multi-core CPUs.
Secure credential management — OSS credentials are managed through DuckDB's native Secret Manager with scope isolation and automatic key redaction.

Features

Read Paimon table data (local and remote OSS)
Projection pushdown optimization
Multiple file format support (Parquet data files, ORC manifest files)
Catalog ATTACH support
DuckDB Secret-based OSS credential management

Use Cases

Lightweight Ad-hoc Queries on Realtime Lakehouses

Data is written into Paimon by Flink in real time. Analysts can query it directly on OSS using DuckDB + duckdb-paimon — no compute cluster needed, reducing query latency from minutes to seconds.

Data Validation & Quality Checks

Use DuckDB in CI/CD pipelines to run data quality assertions on Paimon tables, verifying that Flink job outputs meet expectations. Lightweight, fast, and dependency-free.

Data Exploration & Debugging

Data engineers developing Flink jobs can instantly inspect the current state of Paimon tables using DuckDB Shell, quickly locating data issues — far more efficient than launching a Flink SQL Client.

Cross-format Federated Queries

DuckDB natively supports Parquet, CSV, JSON, Iceberg, and more. Combined with duckdb-paimon, you can JOIN Paimon tables with other data sources without any data movement:

-- Join a Paimon orders table with a local CSV dimension table
SELECT o.order_id, o.amount, c.customer_name
FROM paimon_scan('oss://...', 'db', 'orders') o
JOIN read_csv('customers.csv') c ON o.customer_id = c.id;

Getting Started

Clone the repository:

git clone --recurse-submodules https://github.com/polardb/duckdb-paimon.git
cd duckdb-paimon

Note that --recurse-submodules will ensure DuckDB and paimon-cpp are pulled which are required to build the extension.

Building

GEN=ninja make

Running the Extension

To run the extension code, simply start the shell with ./build/release/duckdb. This shell will have the extension pre-loaded.

Now we can use the features from the extension directly in DuckDB:

Query Local Paimon Tables

SELECT * FROM paimon_scan('./data/testdb.db/testtbl');
┌─────────┬───────┬───────┬────────┐
│   f0    │  f1   │  f2   │   f3   │
│ varchar │ int32 │ int32 │ double │
├─────────┼───────┼───────┼────────┤
│ Alice   │     1 │     0 │   11.0 │
│ Bob     │     1 │     1 │   12.1 │
│ Cathy   │     1 │     2 │   13.2 │
│ David   │     2 │     0 │   21.0 │
│ Eve     │     2 │     1 │   22.1 │
│ Frank   │     2 │     2 │   23.2 │
│ Grace   │     3 │     0 │   31.0 │
│ Henry   │     3 │     1 │   32.1 │
│ Iris    │     3 │     2 │   33.2 │
└─────────┴───────┴───────┴────────┘

Query Remote OSS Paimon Tables

-- Configure OSS credentials
CREATE SECRET my_oss (
    TYPE paimon,
    key_id 'your-access-key-id',
    secret 'your-access-key-secret',
    endpoint 'oss-cn-hangzhou.aliyuncs.com'
);

-- Query Paimon tables on OSS
SELECT * FROM paimon_scan('oss://your-bucket/warehouse', 'your_db', 'your_table');

Attach as Catalog

ATTACH 'oss://my-bucket/warehouse' AS paimon_lake (TYPE paimon);

SHOW ALL TABLES;
DESCRIBE paimon_lake.sales_db.orders;

Running the Tests

make test

Related Projects

Apache Paimon — Realtime lakehouse format
paimon-cpp — Native C++ library for Paimon (underlying dependency)
DuckDB — Embeddable OLAP database
duckdb-iceberg — DuckDB's official Iceberg extension

Join the Community

We welcome contributions and discussions! If you have questions, ideas, or want to connect with other users and developers, join our community by clicking here or scan the QR code below:

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
data/testdb.db/testtbl		data/testdb.db/testtbl
docs		docs
duckdb @ 68d7555		duckdb @ 68d7555
extension-ci-tools @ aac9640		extension-ci-tools @ aac9640
patches		patches
scripts		scripts
src		src
test		test
third_party		third_party
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CLAUDE.md		CLAUDE.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
extension_config.cmake		extension_config.cmake
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuckDB Paimon Extension 🦆

About Apache Paimon

Implementation

Technical Highlights

Features

Use Cases

Lightweight Ad-hoc Queries on Realtime Lakehouses

Data Validation & Quality Checks

Data Exploration & Debugging

Cross-format Federated Queries

Getting Started

Building

Running the Extension

Query Local Paimon Tables

Query Remote OSS Paimon Tables

Attach as Catalog

Running the Tests

Related Projects

Join the Community

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DuckDB Paimon Extension 🦆

About Apache Paimon

Implementation

Technical Highlights

Features

Use Cases

Lightweight Ad-hoc Queries on Realtime Lakehouses

Data Validation & Quality Checks

Data Exploration & Debugging

Cross-format Federated Queries

Getting Started

Building

Running the Extension

Query Local Paimon Tables

Query Remote OSS Paimon Tables

Attach as Catalog

Running the Tests

Related Projects

Join the Community

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages