Skip to content

Use branchless index clamping and add get_batch_direct to RleDecoder#9585

Open
Dandandan wants to merge 5 commits intoapache:mainfrom
Dandandan:pr/branchless-clamping
Open

Use branchless index clamping and add get_batch_direct to RleDecoder#9585
Dandandan wants to merge 5 commits intoapache:mainfrom
Dandandan:pr/branchless-clamping

Conversation

@Dandandan
Copy link
Contributor

Which issue does this PR close?

Closes #9581

Rationale

The RLE dictionary decoding path uses if/else branching to select between checked and unchecked indexing. A branchless .min(max_idx) clamp is simpler and prevents UB on corrupt data.

What changes are included in this PR?

  • Replace if/else checked/unchecked branching with a single branchless .min(max_idx) clamp in get_batch_with_dict
  • Add RleDecodedBatch enum and get_batch_direct method that exposes RLE vs bit-packed batches via callback
  • Gate RleDecodedBatch and get_batch_direct behind arrow feature flag

Are there any user-facing changes?

No.

🤖 Generated with Claude Code

Dandandan and others added 4 commits March 19, 2026 20:56
When bit_width guarantees all possible indices fit within the dictionary,
use unchecked indexing to allow LLVM to unroll the dict gather loop 4x
with paired loads/stores instead of scalar with per-element bounds checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add RleDecodedBatch enum and get_batch_direct method that exposes RLE vs
bit-packed batches via callback, allowing callers to handle each case
optimally without going through the index buffer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace if/else checked/unchecked branching with a single branchless
.min(max_idx) clamp. This prevents UB on corrupt parquet files while
avoiding per-element bounds checks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
These are only used by the arrow dictionary_index decoder. Without
the arrow feature, they appear as dead code to clippy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Mar 19, 2026
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4093150287-470-h8x4d 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing pr/branchless-clamping (01320dd) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   pr_branchless-clamping
-----                                             ----                                   ----------------------
arrow_reader_clickbench/async/Q1                  1.00   1089.5±6.31µs        ? ?/sec    1.00   1093.2±5.49µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.00      6.7±0.03ms        ? ?/sec    1.00      6.7±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.00      7.7±0.04ms        ? ?/sec    1.00      7.7±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.4±0.05ms        ? ?/sec    1.00     14.4±0.05ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.00     17.1±0.06ms        ? ?/sec    1.00     17.1±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.9±0.04ms        ? ?/sec    1.00     16.0±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.01      3.1±0.03ms        ? ?/sec    1.00      3.0±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     76.1±4.12ms        ? ?/sec    1.20     91.0±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.00     83.7±2.51ms        ? ?/sec    1.24    103.5±5.76ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.00    119.1±0.59ms        ? ?/sec    1.10    131.5±4.24ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.01    241.4±2.22ms        ? ?/sec    1.00    239.2±0.82ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.00     19.3±0.07ms        ? ?/sec    1.01     19.6±0.08ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.01     57.6±0.26ms        ? ?/sec    1.00     57.2±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.00     56.9±0.12ms        ? ?/sec    1.01     57.6±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.00     18.5±0.07ms        ? ?/sec    1.00     18.6±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.00     14.8±0.12ms        ? ?/sec    1.01     14.9±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.01      5.5±0.03ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.00     13.1±0.10ms        ? ?/sec    1.00     13.1±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.00     23.5±0.14ms        ? ?/sec    1.02     24.0±0.17ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.02      5.7±0.04ms        ? ?/sec    1.00      5.6±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.03      5.0±0.04ms        ? ?/sec    1.00      4.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.5±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1066.0±2.87µs        ? ?/sec    1.01   1076.5±7.84µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.01      6.6±0.03ms        ? ?/sec    1.00      6.5±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.00      7.6±0.06ms        ? ?/sec    1.00      7.5±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.00     14.3±0.04ms        ? ?/sec    1.01     14.4±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.00     16.8±0.07ms        ? ?/sec    1.01     16.9±0.08ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.00     15.8±0.07ms        ? ?/sec    1.00     15.9±0.07ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.03      3.0±0.02ms        ? ?/sec    1.00      2.9±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.01     71.4±0.47ms        ? ?/sec    1.00     70.7±0.22ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.01     80.0±0.43ms        ? ?/sec    1.00     79.1±0.22ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.01     97.6±0.48ms        ? ?/sec    1.00     96.7±0.49ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    211.3±0.45ms        ? ?/sec    1.09    230.6±5.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.00     19.0±0.06ms        ? ?/sec    1.01     19.3±0.11ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.02     56.8±0.41ms        ? ?/sec    1.00     55.7±0.18ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.00     56.0±0.18ms        ? ?/sec    1.00     56.1±0.25ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.2±0.05ms        ? ?/sec    1.00     18.2±0.10ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.00     14.2±0.10ms        ? ?/sec    1.01     14.3±0.14ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.01      5.4±0.02ms        ? ?/sec    1.00      5.3±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.00     12.5±0.15ms        ? ?/sec    1.01     12.5±0.17ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.00     22.5±0.11ms        ? ?/sec    1.01     22.8±0.14ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.03      5.5±0.04ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.02      4.9±0.03ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.01      3.5±0.01ms        ? ?/sec    1.00      3.4±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    871.0±4.20µs        ? ?/sec    1.01    875.6±2.46µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.01      5.1±0.02ms        ? ?/sec    1.00      5.1±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.01      6.1±0.02ms        ? ?/sec    1.00      6.1±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.01     21.9±0.60ms        ? ?/sec    1.00     21.8±0.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     28.2±0.76ms        ? ?/sec    1.05     29.6±0.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.00     23.0±0.07ms        ? ?/sec    1.17     26.9±0.37ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.05      2.7±0.03ms        ? ?/sec    1.00      2.6±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.02    122.2±0.20ms        ? ?/sec    1.00    119.7±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.04     99.6±1.94ms        ? ?/sec    1.00     95.8±0.96ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.02    143.8±2.85ms        ? ?/sec    1.00    141.1±0.33ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.02   276.7±13.65ms        ? ?/sec    1.00   270.1±14.66ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.00     27.1±0.08ms        ? ?/sec    1.00     27.1±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.03    108.8±0.29ms        ? ?/sec    1.00    105.3±0.24ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.00    103.4±0.14ms        ? ?/sec    1.00    103.5±0.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.01     18.8±0.06ms        ? ?/sec    1.00     18.6±0.06ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.00     22.0±0.07ms        ? ?/sec    1.00     22.1±0.05ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.02ms        ? ?/sec    1.00      6.9±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.00     11.3±0.05ms        ? ?/sec    1.00     11.3±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.00     20.7±0.08ms        ? ?/sec    1.01     20.8±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.03      5.2±0.03ms        ? ?/sec    1.00      5.1±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.01      5.6±0.02ms        ? ?/sec    1.00      5.6±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.00      4.4±0.02ms        ? ?/sec    1.00      4.4±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 779.2s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 712.1s
CPU sys 66.9s
Disk read 0 B
Disk write 2.0 GiB

branch

Metric Value
Wall time 783.7s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 710.2s
CPU sys 73.6s
Disk read 0 B
Disk write 171.4 MiB

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use branchless index clamping and add get_batch_direct to RleDecoder

2 participants