This extension enables DuckDB to read and query Apache Paimon format data directly β no ETL pipelines, no Flink/Spark clusters required. Just open a DuckDB shell and run SQL against your Paimon tables.
Similar to other extension, duckdb-paimon brings DuckDB's powerful local analytics to the Paimon data lake ecosystem.
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations. It innovatively combines lake format and LSM structure, bringing realtime streaming updates into the lake architecture.
This extension is built on top of paimon-cpp, an open-source C++ library that provides native access to Paimon format data. It is the first library that brings native Paimon read/write capabilities to the C++ ecosystem.
- Zero JVM dependency β No Java runtime required. Pure C++ implementation means minimal memory footprint and instant startup.
- Apache Arrow data exchange β Data flows between paimon-cpp and DuckDB via Apache Arrow, the industry standard for columnar in-memory data, enabling zero-copy transfers with no serialization overhead.
- Parallel scan architecture β Paimon tables are split into independent Splits, and DuckDB's multi-threaded execution engine reads them in parallel to fully utilize multi-core CPUs.
- Secure credential management β OSS credentials are managed through DuckDB's native Secret Manager with scope isolation and automatic key redaction.
- Read Paimon table data (local and remote OSS)
- Projection pushdown optimization
- Multiple file format support (Parquet data files, ORC manifest files)
- Catalog ATTACH support
- DuckDB Secret-based OSS credential management
Data is written into Paimon by Flink in real time. Analysts can query it directly on OSS using DuckDB + duckdb-paimon β no compute cluster needed, reducing query latency from minutes to seconds.
Use DuckDB in CI/CD pipelines to run data quality assertions on Paimon tables, verifying that Flink job outputs meet expectations. Lightweight, fast, and dependency-free.
Data engineers developing Flink jobs can instantly inspect the current state of Paimon tables using DuckDB Shell, quickly locating data issues β far more efficient than launching a Flink SQL Client.
DuckDB natively supports Parquet, CSV, JSON, Iceberg, and more. Combined with duckdb-paimon, you can JOIN Paimon tables with other data sources without any data movement:
-- Join a Paimon orders table with a local CSV dimension table
SELECT o.order_id, o.amount, c.customer_name
FROM paimon_scan('oss://...', 'db', 'orders') o
JOIN read_csv('customers.csv') c ON o.customer_id = c.id;Clone the repository:
git clone --recurse-submodules https://github.com/polardb/duckdb-paimon.git
cd duckdb-paimonNote that --recurse-submodules will ensure DuckDB and paimon-cpp are pulled which are required to build the extension.
GEN=ninja makeTo run the extension code, simply start the shell with ./build/release/duckdb. This shell will have the extension pre-loaded.
Now we can use the features from the extension directly in DuckDB:
SELECT * FROM paimon_scan('./data/testdb.db/testtbl');
βββββββββββ¬ββββββββ¬ββββββββ¬βββββββββ
β f0 β f1 β f2 β f3 β
β varchar β int32 β int32 β double β
βββββββββββΌββββββββΌββββββββΌβββββββββ€
β Alice β 1 β 0 β 11.0 β
β Bob β 1 β 1 β 12.1 β
β Cathy β 1 β 2 β 13.2 β
β David β 2 β 0 β 21.0 β
β Eve β 2 β 1 β 22.1 β
β Frank β 2 β 2 β 23.2 β
β Grace β 3 β 0 β 31.0 β
β Henry β 3 β 1 β 32.1 β
β Iris β 3 β 2 β 33.2 β
βββββββββββ΄ββββββββ΄ββββββββ΄βββββββββ-- Configure OSS credentials
CREATE SECRET my_oss (
TYPE paimon,
key_id 'your-access-key-id',
secret 'your-access-key-secret',
endpoint 'oss-cn-hangzhou.aliyuncs.com'
);
-- Query Paimon tables on OSS
SELECT * FROM paimon_scan('oss://your-bucket/warehouse', 'your_db', 'your_table');ATTACH 'oss://my-bucket/warehouse' AS paimon_lake (TYPE paimon);
SHOW ALL TABLES;
DESCRIBE paimon_lake.sales_db.orders;make test- Apache Paimon β Realtime lakehouse format
- paimon-cpp β Native C++ library for Paimon (underlying dependency)
- DuckDB β Embeddable OLAP database
- duckdb-iceberg β DuckDB's official Iceberg extension
We welcome contributions and discussions! If you have questions, ideas, or want to connect with other users and developers, join our community by clicking here or scan the QR code below:
