Skip to content

Support for ClickHouse Map type in pg_clickhouse #128

@WanderWang

Description

@WanderWang

Issue Type

  • Feature Request
  • Bug Report
  • Documentation Issue

Description

When using pg_clickhouse to query ClickHouse tables with Map type columns, the FDW converts the Map type to PostgreSQL text type. This makes it impossible to efficiently query Map fields and filter by specific keys/values.

Current Behavior

Given a ClickHouse table with a Map column:

CREATE TABLE default.traces (
    id UUID,
    name String,
    metadata Map(LowCardinality(String), String)
) ENGINE = MergeTree() ORDER BY id;

When importing this table into PostgreSQL via pg_clickhouse:

CREATE FOREIGN TABLE langfuse.traces (
    id text,
    name text,
    metadata text  -- Map is converted to text
)
SERVER clickhouse_server
OPTIONS (database 'default', table_name 'traces');

The metadata column becomes text type in PostgreSQL, containing the string representation like {'status':'idle'}.

Limitations

  1. Cannot use Map functions: ClickHouse's native Map functions (mapContains, mapGet, metadata['key']) cannot be used because PostgreSQL treats the column as text

  2. Cannot use string functions: String functions (LIKE, position, substring) fail when pushed down to ClickHouse because ClickHouse rejects them for Map types

  3. Cannot use PostgreSQL JSONB functions: JSONB functions (::jsonb->>'key') are pushed down to ClickHouse, which doesn't support PostgreSQL's JSONB functions

Example Failures

-- ClickHouse Map functions don't work
SELECT metadata['status'] FROM langfuse.traces;
-- ERROR: cannot subscript type text because it does not support subscripting

-- String functions fail when pushed down
SELECT * FROM langfuse.traces WHERE metadata LIKE '%status%';
-- ERROR: pg_clickhouse: Code: 43. DB::Exception: Illegal type Map(String, String) of argument of function like

-- PostgreSQL JSONB functions fail
SELECT metadata::jsonb->>'status' FROM langfuse.traces;
-- Works only with LIMIT (prevents pushdown), not efficient for large datasets

Desired Behavior

We need a way to query Map columns efficiently from PostgreSQL, with filter pushdown to ClickHouse.

Potential Solutions

Option 1: Map to JSONB Type (Recommended)

Automatically map ClickHouse Map type to PostgreSQL JSONB type:

-- Automatic type mapping
CREATE FOREIGN TABLE langfuse.traces (
    id text,
    metadata jsonb  -- Map mapped to JSONB
)
SERVER clickhouse_server
OPTIONS (database 'default', table_name 'traces');

-- Now we can query:
SELECT metadata->>'status' FROM langfuse.traces;
-- This should push down to ClickHouse as: mapContains(metadata, 'status')

This would involve:

  1. Detecting Map types during import
  2. Converting Map to JSON format when querying
  3. Translating PostgreSQL JSONB operations to ClickHouse Map operations

Option 2: Pass-through Map Type

Support a special PostgreSQL type (e.g., clickhouse_map) that preserves the Map type:

-- Explicit type specification
CREATE FOREIGN TABLE langfuse.traces (
    id text,
    metadata clickhouse_map  -- Preserve Map type
)
SERVER clickhouse_server
OPTIONS (database 'default', table_name 'traces');

-- Use Map functions directly
SELECT mapContains(metadata, 'status') FROM langfuse.traces;

Option 3: Column-Level Type Options

Allow specifying column-level type options:

CREATE FOREIGN TABLE langfuse.traces (
    id text,
    metadata text OPTIONS (clickhouse_type 'map')
)
SERVER clickhouse_server
OPTIONS (database 'default', table_name 'traces');

Use Case

This feature is essential for applications that:

  1. Store metadata as flexible key-value pairs in ClickHouse Map columns
  2. Need to query and filter by specific metadata fields from PostgreSQL
  3. Require high performance with filter pushdown (not post-processing in PostgreSQL)
  4. Cannot modify the ClickHouse table structure (e.g., cannot add additional columns)

Example Scenario

We have a LLM tracing system that stores traces in ClickHouse:

-- ClickHouse table
CREATE TABLE traces (
    id UUID,
    name String,
    model String,
    metadata Map(String, String)  -- Stores: {'status': 'pending', 'version': '1.0', 'user_id': '123'}
) ENGINE = MergeTree() ORDER BY id;

Common queries needed from PostgreSQL:

-- Find all traces with status = 'completed'
SELECT * FROM traces WHERE metadata->>'status' = 'completed';

-- Count traces by status
SELECT metadata->>'status', COUNT(*) FROM traces GROUP BY metadata->>'status';

-- Filter by multiple metadata fields
SELECT * FROM traces 
WHERE metadata->>'status' = 'completed'
  AND metadata->>'version' = '1.0';

Currently, these queries either fail or require fetching all data and filtering in PostgreSQL, which is not scalable for large datasets.

Environment

  • ClickHouse Version: 24.12.3.47 (official build)
  • PostgreSQL Version: PostgreSQL 15.x
  • pg_clickhouse Version: Latest (unknown exact version)
  • OS: Linux

Additional Context

Why Not Use Alternatives?

  1. Cannot modify ClickHouse table: Adding a JSON column or creating views is not an option in our deployment
  2. Need PostgreSQL interface: Our application code uses PostgreSQL, not ClickHouse directly
  3. Performance requirements: Need filter pushdown for large-scale queries (millions of rows)

Similar Issues

I noticed that PR #74 added support for JSON type mapping to JSONB. This feature request is similar but for Map type, which is the primary JSON-like type used in ClickHouse.

Workaround Limitations

Current workarounds have significant limitations:

  1. Create View in ClickHouse: Not allowed in our scenario
  2. Use LIMIT + application filtering: Not efficient for large datasets, no filter pushdown
  3. Direct ClickHouse connection: Requires significant code changes, loses PostgreSQL benefits

References

Suggested Implementation

For Option 1 (Map to JSONB), the implementation could:

  1. Type Detection: During table import, detect Map columns
  2. Automatic Mapping: Assign JSONB type to Map columns
  3. Query Translation:
    • jsonb->>'key'mapContains(metadata, 'key') AND metadata['key']
    • jsonb ? 'key'mapContains(metadata, 'key')
    • jsonb @> '{"key":"value"}'mapContains(metadata, 'key') AND metadata['key'] = 'value'
  4. Format Conversion: Convert Map format ({'key':'value'}) to JSON format ({"key":"value"}) on read

This would be similar to how PR #74 handled JSON types but tailored for Map types.

Priority

High - This is a critical feature for applications using ClickHouse Map columns with pg_clickhouse FDW.

Metadata

Metadata

Assignees

No one assigned

    Labels

    data typesImprove data type supportenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions