feat: Dictionary page pruning for row filter predicates#9574
feat: Dictionary page pruning for row filter predicates#9574Dandandan wants to merge 7 commits intoapache:mainfrom
Conversation
When evaluating row filter predicates on dictionary-encoded columns, evaluate the predicate against dictionary values before decoding data pages. If no dictionary values match (AllFalse), skip the entire column chunk. If all dictionary values match (AllTrue), skip per-row predicate evaluation entirely. This optimization is most effective for selective equality filters (e.g. `CounterID = 62`) on dictionary-encoded columns where the value doesn't exist in some row groups' dictionaries. Benchmark results on ClickBench (async_object_store): - Q19 (CounterID=62, 3 predicates): -35% (2.57ms → 1.66ms) - Q42 (CounterID=62, 2 predicates): -8% (3.63ms → 3.35ms) - No regressions on other queries Supports BYTE_ARRAY (strings), INT32, and INT64 physical types. Only applies when all data pages are dictionary-encoded (no fallback). Currently implemented for the async push decoder path only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
Benchmark for this request failed. Last 20 lines of output: Click to expand |
|
run benchmark arrow_reader_clickbench |
|
🤖 Arrow criterion benchmark running (GKE) | trigger |
|
YAAAS |
|
🤖 Arrow criterion benchmark completed (GKE) | trigger Details
Resource Usagebase (merge-base)
branch
|
🎉 We could (at the cost of extra IO request, but saving for some) also do this before loading the column chunks I think - (Perhaps it makes sense for other row filters as well to disable making IO requests small/sequential (especially object storage): don't try to save IO (for small/medium sized columns) but still try to prune to save CPU. |
I agree |
- Use arrow type from ParquetField tree instead of hardcoded Utf8View - Support Utf8, LargeUtf8, BinaryView string types - Support Timestamp types for INT64 dictionary columns - Skip nested/struct columns (only prune top-level primitives) - Update snapshot for changed I/O pattern (AllTrue skips filter eval) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| " Row Group 1, column 'b': DictionaryPage (1617 bytes, 1 requests) [data]", | ||
| " Row Group 1, column 'b': DataPage(0) (113 bytes , 1 requests) [data]", | ||
| " Row Group 1, column 'b': DataPage(1) (126 bytes , 1 requests) [data]", | ||
| " Row Group 1, column 'b': MultiPage(dictionary_page: true, data_pages: [0, 1]) (1856 bytes, 1 requests) [data]", |
There was a problem hiding this comment.
Was a bit surprised to see this changed (it's more optimal).
Seems when all values pass it creates 3 requests?
There was a problem hiding this comment.
For anyone following along, I think @Dandandan fixed it here:
There was a problem hiding this comment.
Yes :)
I think for object storage it will still coalesce the ranges afterwards - but on localfilesystem it will do a number of syscalls (which as it does it one after another should be less efficient then one go anyway).
The test_row_numbers_with_multiple_row_groups_and_filter test used a stateful position-based predicate that broke when evaluate_dictionary called evaluate on dictionary values, advancing the internal offset incorrectly. Replace with a stateless value-based filter (value % 2 != 0). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Is this similar to |
No - this PR applies it to the dictionary (which is small thus fast) and avoids decompressing / decoding the data pages. The linked PR #9464 as far as I see tries to reuse the booleans from the predicate by gathering them back onto the rows. |
Add Method 1a to is_all_dictionary_encoded that checks col_meta.page_encoding_stats() (the full Vec<PageEncodingStats>) when the mask form wasn't used, covering the case where ParquetMetaDataOptions::with_encoding_stats_as_mask was set to false. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| // These are used for definition/repetition levels, not data | ||
| #[allow(deprecated)] | ||
| Encoding::RLE | Encoding::BIT_PACKED => {} | ||
| // PLAIN is ambiguous - used for def/rep levels in V1 pages AND |
There was a problem hiding this comment.
I'm not so sure about this one. So the way the encodings work is:
- levels are RLE or BIT_PACKED (as noted above)
- V1 dictionary uses the PLAIN_DICTIONARY variant for both the dictionary and data
- V2 dictionary uses PLAIN for dictionary and RLE_DICTIONARY for data
- all other encodings are for data
So if we see PLAIN_DICTIONARY and PLAIN, we can know that fallback has definitely occurred. In the case of RLE_DICTIONARY, it's a coin flip, but I'd err on the side of caution and return false in that case. In my experience fallback is quite common.
let mut has_plain_dict_encoding = false;
let mut has_plain_encoding = false;
for enc in col_meta.encodings() {
match enc {
Encoding::PLAIN_DICTIONARY => has_plain_dict_encoding = true,
// for RLE_DICT we can't know if fallback has occurred...be pessimistic
Encoding::RLE_DICTIONARY => return false,
Encoding::PLAIN => has_plain_encoding = true,
// These are used for definition/repetition levels, not data
#[allow(deprecated)]
Encoding::RLE | Encoding::BIT_PACKED => {}
// Any other encoding (DELTA_*, etc.) means non-dictionary data
_ => return false,
}
}
has_plain_dict_encoding && !has_plain_encodingThere was a problem hiding this comment.
or more simply:
for enc in col_meta.encodings() {
match enc {
#[allow(deprecated)]
// Either V1 dict encoded or level data
Encoding::PLAIN_DICTIONARY | Encoding::RLE | Encoding::BIT_PACKED => {}
// Any other encoding means non-dictionary data
_ => return false,
}
}
true
etseidl
left a comment
There was a problem hiding this comment.
Thanks @Dandandan, this looks really cool. I'm happy to see the encodings mask being used :)
I'm not super up on the filtering bits, but they look correct to me. I just have a few questions, and think the all_dict test should be more conservative.
|
|
||
| let physical_type = schema_descr.column(col_idx).physical_type(); | ||
|
|
||
| // Only support BYTE_ARRAY and INT32/INT64 columns |
There was a problem hiding this comment.
Just curious why not other physical types?
| )), | ||
| } | ||
| } | ||
| _ => Ok(Arc::new(arrow_array::Int64Array::from(values))), |
Closes: #9588
Summary
AllFalse): skip the entire column chunkAllTrue): skip per-row predicate evaluationevaluate_dictionarymethod toArrowPredicatetrait with a default implementation that delegates toevaluateDetails
Benchmark Results (ClickBench async_object_store)
CounterID=62— prunes 1 of 3 row groupsThe optimization is most effective for selective equality filters on dictionary-encoded columns where the target value doesn't appear in some row groups' dictionaries.
Test plan
🤖 Generated with Claude Code