-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Issue Type
- Feature Request
- Bug Report
- Documentation Issue
Description
When using pg_clickhouse to query ClickHouse tables with Map type columns, the FDW converts the Map type to PostgreSQL text type. This makes it impossible to efficiently query Map fields and filter by specific keys/values.
Current Behavior
Given a ClickHouse table with a Map column:
CREATE TABLE default.traces (
id UUID,
name String,
metadata Map(LowCardinality(String), String)
) ENGINE = MergeTree() ORDER BY id;When importing this table into PostgreSQL via pg_clickhouse:
CREATE FOREIGN TABLE langfuse.traces (
id text,
name text,
metadata text -- Map is converted to text
)
SERVER clickhouse_server
OPTIONS (database 'default', table_name 'traces');The metadata column becomes text type in PostgreSQL, containing the string representation like {'status':'idle'}.
Limitations
-
Cannot use Map functions: ClickHouse's native Map functions (
mapContains,mapGet,metadata['key']) cannot be used because PostgreSQL treats the column astext -
Cannot use string functions: String functions (
LIKE,position,substring) fail when pushed down to ClickHouse because ClickHouse rejects them for Map types -
Cannot use PostgreSQL JSONB functions: JSONB functions (
::jsonb->>'key') are pushed down to ClickHouse, which doesn't support PostgreSQL's JSONB functions
Example Failures
-- ClickHouse Map functions don't work
SELECT metadata['status'] FROM langfuse.traces;
-- ERROR: cannot subscript type text because it does not support subscripting
-- String functions fail when pushed down
SELECT * FROM langfuse.traces WHERE metadata LIKE '%status%';
-- ERROR: pg_clickhouse: Code: 43. DB::Exception: Illegal type Map(String, String) of argument of function like
-- PostgreSQL JSONB functions fail
SELECT metadata::jsonb->>'status' FROM langfuse.traces;
-- Works only with LIMIT (prevents pushdown), not efficient for large datasetsDesired Behavior
We need a way to query Map columns efficiently from PostgreSQL, with filter pushdown to ClickHouse.
Potential Solutions
Option 1: Map to JSONB Type (Recommended)
Automatically map ClickHouse Map type to PostgreSQL JSONB type:
-- Automatic type mapping
CREATE FOREIGN TABLE langfuse.traces (
id text,
metadata jsonb -- Map mapped to JSONB
)
SERVER clickhouse_server
OPTIONS (database 'default', table_name 'traces');
-- Now we can query:
SELECT metadata->>'status' FROM langfuse.traces;
-- This should push down to ClickHouse as: mapContains(metadata, 'status')This would involve:
- Detecting Map types during import
- Converting Map to JSON format when querying
- Translating PostgreSQL JSONB operations to ClickHouse Map operations
Option 2: Pass-through Map Type
Support a special PostgreSQL type (e.g., clickhouse_map) that preserves the Map type:
-- Explicit type specification
CREATE FOREIGN TABLE langfuse.traces (
id text,
metadata clickhouse_map -- Preserve Map type
)
SERVER clickhouse_server
OPTIONS (database 'default', table_name 'traces');
-- Use Map functions directly
SELECT mapContains(metadata, 'status') FROM langfuse.traces;Option 3: Column-Level Type Options
Allow specifying column-level type options:
CREATE FOREIGN TABLE langfuse.traces (
id text,
metadata text OPTIONS (clickhouse_type 'map')
)
SERVER clickhouse_server
OPTIONS (database 'default', table_name 'traces');Use Case
This feature is essential for applications that:
- Store metadata as flexible key-value pairs in ClickHouse Map columns
- Need to query and filter by specific metadata fields from PostgreSQL
- Require high performance with filter pushdown (not post-processing in PostgreSQL)
- Cannot modify the ClickHouse table structure (e.g., cannot add additional columns)
Example Scenario
We have a LLM tracing system that stores traces in ClickHouse:
-- ClickHouse table
CREATE TABLE traces (
id UUID,
name String,
model String,
metadata Map(String, String) -- Stores: {'status': 'pending', 'version': '1.0', 'user_id': '123'}
) ENGINE = MergeTree() ORDER BY id;Common queries needed from PostgreSQL:
-- Find all traces with status = 'completed'
SELECT * FROM traces WHERE metadata->>'status' = 'completed';
-- Count traces by status
SELECT metadata->>'status', COUNT(*) FROM traces GROUP BY metadata->>'status';
-- Filter by multiple metadata fields
SELECT * FROM traces
WHERE metadata->>'status' = 'completed'
AND metadata->>'version' = '1.0';Currently, these queries either fail or require fetching all data and filtering in PostgreSQL, which is not scalable for large datasets.
Environment
- ClickHouse Version: 24.12.3.47 (official build)
- PostgreSQL Version: PostgreSQL 15.x
- pg_clickhouse Version: Latest (unknown exact version)
- OS: Linux
Additional Context
Why Not Use Alternatives?
- Cannot modify ClickHouse table: Adding a JSON column or creating views is not an option in our deployment
- Need PostgreSQL interface: Our application code uses PostgreSQL, not ClickHouse directly
- Performance requirements: Need filter pushdown for large-scale queries (millions of rows)
Similar Issues
I noticed that PR #74 added support for JSON type mapping to JSONB. This feature request is similar but for Map type, which is the primary JSON-like type used in ClickHouse.
Workaround Limitations
Current workarounds have significant limitations:
- Create View in ClickHouse: Not allowed in our scenario
- Use LIMIT + application filtering: Not efficient for large datasets, no filter pushdown
- Direct ClickHouse connection: Requires significant code changes, loses PostgreSQL benefits
References
- PR Map ClickHouse JSON type to PostgreSQL JSONB #74: Add JSONB support (similar pattern, but for JSON type instead of Map)
- ClickHouse Map documentation: https://clickhouse.com/docs/en/sql-reference/data-types/map/
- pg_clickhouse type mapping documentation
Suggested Implementation
For Option 1 (Map to JSONB), the implementation could:
- Type Detection: During table import, detect Map columns
- Automatic Mapping: Assign JSONB type to Map columns
- Query Translation:
jsonb->>'key'→mapContains(metadata, 'key') AND metadata['key']jsonb ? 'key'→mapContains(metadata, 'key')jsonb @> '{"key":"value"}'→mapContains(metadata, 'key') AND metadata['key'] = 'value'
- Format Conversion: Convert Map format (
{'key':'value'}) to JSON format ({"key":"value"}) on read
This would be similar to how PR #74 handled JSON types but tailored for Map types.
Priority
High - This is a critical feature for applications using ClickHouse Map columns with pg_clickhouse FDW.