Based on the ClickHouse documentation for Variants, there are 4 different ways to query JSON data stored in Variant(JSON) columns.
JSONExtractString(toString(variantElement(data, 'JSON')), 'kind')Performance: ~2.48s for 1M records
JSONExtractString(CAST(variantElement(data, 'JSON') AS String), 'kind')Performance: ~2.45s for 1M records
JSON_VALUE(toString(variantElement(data, 'JSON')), '$.kind')Performance: ~2.55s for 1M records
JSONExtractString(toString(data.JSON), 'kind')Performance: ~2.48s for 1M records
According to the ClickHouse documentation:
"Variant type supports reading a single nested type from a Variant column using the type name as a subcolumn. So, if you have column
variant Variant(T1, T2, T3)you can read a subcolumn of typeT2using syntaxvariant.T2"
- ✅ Cleaner syntax:
data.JSONinstead ofvariantElement(data, 'JSON') - ✅ Same performance: ~2.48s (identical to variantElement method)
- ✅ More readable: Familiar dot notation
- ✅ Official approach: Documented ClickHouse feature
-- Basic field extraction (cleanest syntax)
SELECT JSONExtractString(toString(data.JSON), 'kind')
FROM bluesky_minimal_1m.bluesky_data;
-- Nested field extraction
SELECT JSONExtractString(toString(data.JSON), 'commit', 'collection')
FROM bluesky_minimal_1m.bluesky_data;
-- With JSON_VALUE (JSONPath)
SELECT JSON_VALUE(toString(data.JSON), '$.kind')
FROM bluesky_minimal_1m.bluesky_data;
-- Complex aggregation
SELECT
JSONExtractString(toString(data.JSON), 'kind') as kind,
count()
FROM bluesky_minimal_1m.bluesky_data
GROUP BY kind;| Method | Syntax | Performance | Ranking |
|---|---|---|---|
| Subcolumn + JSONExtractString | JSONExtractString(toString(data.JSON), 'field') |
~2.48s | 🥇 Best |
| variantElement + CAST | JSONExtractString(CAST(variantElement(data, 'JSON') AS String), 'field') |
~2.45s | 🥈 |
| variantElement + toString | JSONExtractString(toString(variantElement(data, 'JSON')), 'field') |
~2.48s | 🥉 |
| variantElement + JSON_VALUE | JSON_VALUE(toString(variantElement(data, 'JSON')), '$.field') |
~2.55s | 4th |
| Subcolumn + JSON_VALUE | JSON_VALUE(toString(data.JSON), '$.field') |
~3.03s | 5th |
JSONExtractString(toString(data.JSON), 'field')Why: Cleanest syntax, excellent performance, official ClickHouse feature
JSON_VALUE(toString(data.JSON), '$.field') Why: JSONPath flexibility with clean subcolumn syntax
- More verbose:
variantElement(data, 'JSON')vsdata.JSON - Same performance but uglier syntax
- Not the intended/documented way
From variantElement to Subcolumn:
-- OLD (verbose)
JSONExtractString(toString(variantElement(data, 'JSON')), 'kind')
-- NEW (clean) ⭐
JSONExtractString(toString(data.JSON), 'kind')
-- OLD (JSONPath)
JSON_VALUE(toString(variantElement(data, 'JSON')), '$.kind')
-- NEW (JSONPath) ⭐
JSON_VALUE(toString(data.JSON), '$.kind')The Subcolumn Access method is the official, documented way to query Variant columns in ClickHouse. It provides:
- Cleaner syntax with familiar dot notation
- Same performance as variantElement methods
- Better readability and maintainability
- Official support from ClickHouse documentation
Bottom line: Use data.JSON instead of variantElement(data, 'JSON') for cleaner, more maintainable code!
Note: All Variant methods are still ~70x slower than native JSON Object type (toString(data.field) at ~0.12s), so consider native JSON for pure JSON use cases.