Skip to content

Avoid unnecessary buffer zero-fill in Snappy decompression#9583

Open
Dandandan wants to merge 1 commit intoapache:mainfrom
Dandandan:pr/snappy-zero-fill
Open

Avoid unnecessary buffer zero-fill in Snappy decompression#9583
Dandandan wants to merge 1 commit intoapache:mainfrom
Dandandan:pr/snappy-zero-fill

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Mar 19, 2026

Which issue does this PR close?

Closes #9579

Rationale

Currently, Snappy decompression uses resize(len, 0) which zero-fills the buffer before writing. Since Snappy will overwrite the entire region on success, this memset is wasted work.

1-2% win on snappy e2e decoding of snappy encoded parquet data

What changes are included in this PR?

Write directly into spare capacity using reserve() + spare_capacity_mut() + set_len(), eliminating the unnecessary zero-fill.

Are there any user-facing changes?

No.

🤖 Generated with Claude Code

Write directly into spare capacity instead of resize+zero-fill,
eliminating unnecessary memset for the decompression output buffer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Dandandan
Copy link
Contributor Author

run benchmark arrow_reader_clickbench

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4093137898-468-pw7st 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing pr/snappy-zero-fill (eaa3ae4) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader_clickbench
BENCH_FILTER=
Results will be posted here when complete

@adriangbot
Copy link

🤖 Arrow criterion benchmark completed (GKE) | trigger

Details

group                                             main                                   pr_snappy-zero-fill
-----                                             ----                                   -------------------
arrow_reader_clickbench/async/Q1                  1.01   1093.5±5.62µs        ? ?/sec    1.00   1087.7±3.63µs        ? ?/sec
arrow_reader_clickbench/async/Q10                 1.01      6.7±0.05ms        ? ?/sec    1.00      6.7±0.04ms        ? ?/sec
arrow_reader_clickbench/async/Q11                 1.02      7.8±0.07ms        ? ?/sec    1.00      7.7±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q12                 1.00     14.4±0.07ms        ? ?/sec    1.00     14.4±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q13                 1.01     17.1±0.09ms        ? ?/sec    1.00     16.9±0.10ms        ? ?/sec
arrow_reader_clickbench/async/Q14                 1.00     15.9±0.07ms        ? ?/sec    1.00     15.9±0.09ms        ? ?/sec
arrow_reader_clickbench/async/Q19                 1.01      3.1±0.03ms        ? ?/sec    1.00      3.1±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q20                 1.00     78.7±0.37ms        ? ?/sec    1.13     88.9±9.93ms        ? ?/sec
arrow_reader_clickbench/async/Q21                 1.22     97.0±0.55ms        ? ?/sec    1.00     79.4±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q22                 1.11    131.6±5.00ms        ? ?/sec    1.00    118.2±6.31ms        ? ?/sec
arrow_reader_clickbench/async/Q23                 1.02    245.9±0.84ms        ? ?/sec    1.00    240.6±1.16ms        ? ?/sec
arrow_reader_clickbench/async/Q24                 1.04     20.0±0.14ms        ? ?/sec    1.00     19.2±0.14ms        ? ?/sec
arrow_reader_clickbench/async/Q27                 1.04     58.7±0.55ms        ? ?/sec    1.00     56.3±0.21ms        ? ?/sec
arrow_reader_clickbench/async/Q28                 1.03     57.9±0.36ms        ? ?/sec    1.00     56.3±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q30                 1.01     18.6±0.07ms        ? ?/sec    1.00     18.4±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q36                 1.02     15.3±0.28ms        ? ?/sec    1.00     14.9±0.12ms        ? ?/sec
arrow_reader_clickbench/async/Q37                 1.00      5.4±0.03ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q38                 1.03     13.6±0.26ms        ? ?/sec    1.00     13.1±0.16ms        ? ?/sec
arrow_reader_clickbench/async/Q39                 1.03     24.4±0.31ms        ? ?/sec    1.00     23.8±0.18ms        ? ?/sec
arrow_reader_clickbench/async/Q40                 1.01      5.8±0.06ms        ? ?/sec    1.00      5.7±0.03ms        ? ?/sec
arrow_reader_clickbench/async/Q41                 1.01      5.0±0.03ms        ? ?/sec    1.00      4.9±0.02ms        ? ?/sec
arrow_reader_clickbench/async/Q42                 1.00      3.5±0.02ms        ? ?/sec    1.00      3.5±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q1     1.00   1062.7±2.48µs        ? ?/sec    1.00   1067.4±2.72µs        ? ?/sec
arrow_reader_clickbench/async_object_store/Q10    1.02      6.6±0.06ms        ? ?/sec    1.00      6.5±0.05ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q11    1.01      7.6±0.06ms        ? ?/sec    1.00      7.5±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q12    1.01     14.3±0.08ms        ? ?/sec    1.00     14.2±0.08ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q13    1.02     17.1±0.24ms        ? ?/sec    1.00     16.8±0.15ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q14    1.00     15.9±0.11ms        ? ?/sec    1.00     15.8±0.12ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q19    1.01      3.0±0.03ms        ? ?/sec    1.00      2.9±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q20    1.03     72.2±0.64ms        ? ?/sec    1.00     70.0±0.28ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q21    1.03     80.8±0.54ms        ? ?/sec    1.00     78.5±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q22    1.04     99.1±0.77ms        ? ?/sec    1.00     95.4±0.26ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q23    1.00    213.3±0.80ms        ? ?/sec    1.12    238.7±1.23ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q24    1.01     19.4±0.14ms        ? ?/sec    1.00     19.2±0.09ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q27    1.03     57.2±0.63ms        ? ?/sec    1.00     55.4±0.27ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q28    1.03     56.9±0.45ms        ? ?/sec    1.00     55.5±0.24ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q30    1.00     18.3±0.08ms        ? ?/sec    1.00     18.3±0.06ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q36    1.01     14.5±0.23ms        ? ?/sec    1.00     14.4±0.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q37    1.00      5.3±0.03ms        ? ?/sec    1.00      5.4±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q38    1.01     12.8±0.20ms        ? ?/sec    1.00     12.6±0.20ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q39    1.02     23.3±0.28ms        ? ?/sec    1.00     22.7±0.19ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q40    1.01      5.5±0.04ms        ? ?/sec    1.00      5.4±0.03ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q41    1.01      4.8±0.02ms        ? ?/sec    1.00      4.8±0.02ms        ? ?/sec
arrow_reader_clickbench/async_object_store/Q42    1.01      3.5±0.02ms        ? ?/sec    1.00      3.4±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q1                   1.00    868.7±1.80µs        ? ?/sec    1.01    873.1±1.90µs        ? ?/sec
arrow_reader_clickbench/sync/Q10                  1.00      5.2±0.04ms        ? ?/sec    1.00      5.1±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q11                  1.00      6.1±0.04ms        ? ?/sec    1.00      6.1±0.04ms        ? ?/sec
arrow_reader_clickbench/sync/Q12                  1.02     22.1±0.67ms        ? ?/sec    1.00     21.6±0.15ms        ? ?/sec
arrow_reader_clickbench/sync/Q13                  1.00     28.7±0.88ms        ? ?/sec    1.05     30.2±0.25ms        ? ?/sec
arrow_reader_clickbench/sync/Q14                  1.00     23.1±0.12ms        ? ?/sec    1.19     27.4±0.48ms        ? ?/sec
arrow_reader_clickbench/sync/Q19                  1.04      2.8±0.03ms        ? ?/sec    1.00      2.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20                  1.03    125.7±0.35ms        ? ?/sec    1.00    122.0±0.34ms        ? ?/sec
arrow_reader_clickbench/sync/Q21                  1.03     99.3±0.19ms        ? ?/sec    1.00     96.4±0.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q22                  1.01    145.7±0.50ms        ? ?/sec    1.00    144.6±0.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q23                  1.01   282.2±14.62ms        ? ?/sec    1.00   280.6±16.88ms        ? ?/sec
arrow_reader_clickbench/sync/Q24                  1.02     27.4±0.13ms        ? ?/sec    1.00     26.9±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q27                  1.05    109.9±0.24ms        ? ?/sec    1.00    104.6±0.13ms        ? ?/sec
arrow_reader_clickbench/sync/Q28                  1.04    105.7±0.18ms        ? ?/sec    1.00    101.9±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q30                  1.02     18.9±0.08ms        ? ?/sec    1.00     18.5±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q36                  1.02     22.3±0.13ms        ? ?/sec    1.00     21.9±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q37                  1.00      6.9±0.01ms        ? ?/sec    1.00      6.9±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q38                  1.02     11.5±0.08ms        ? ?/sec    1.00     11.2±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q39                  1.03     21.1±0.12ms        ? ?/sec    1.00     20.5±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q40                  1.00      5.2±0.02ms        ? ?/sec    1.00      5.2±0.03ms        ? ?/sec
arrow_reader_clickbench/sync/Q41                  1.00      5.6±0.02ms        ? ?/sec    1.00      5.7±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q42                  1.01      4.4±0.02ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec

Resource Usage

base (merge-base)

Metric Value
Wall time 784.1s
Peak memory 3.1 GiB
Avg memory 2.9 GiB
CPU user 707.4s
CPU sys 76.4s
Disk read 0 B
Disk write 758.4 MiB

branch

Metric Value
Wall time 781.9s
Peak memory 3.2 GiB
Avg memory 3.1 GiB
CPU user 707.9s
CPU sys 74.1s
Disk read 0 B
Disk write 171.3 MiB

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very exciting @Dandandan

let n = self
.decoder
.decompress(input_buf, &mut spare_bytes[..len])
.map_err(|e| -> ParquetError { e.into() })?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this returns on error before setting len, will the buffer be left in an inconsistent state?

I think the use of the mut slice ensures that the call to decompress won't overwrite the newly allocated bytes.

However, this also basically passes in uninitialized bytes to decompress -- how do we know that the decompress doesn't read them? Maybe we should add a SAFETY warning to the decompress API that says it can't rely on initialized bytes 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Effectively we rely on this:

https://docs.rs/snap/latest/snap/raw/struct.Decoder.html#errors

  • output has length less than decompress_len(input).

To not use unsafe we would need to have this feature:
BurntSushi/rust-snappy#62

@alamb
Copy link
Contributor

alamb commented Mar 20, 2026

run benchmark arrow_reader

@adriangbot
Copy link

🤖 Arrow criterion benchmark running (GKE) | trigger
Linux bench-c4098640020-479-gzgft 6.12.55+ #1 SMP Sun Feb 1 08:59:41 UTC 2026 aarch64 GNU/Linux
Comparing pr/snappy-zero-fill (eaa3ae4) to 88422cb (merge-base) diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental,object_store --bench arrow_reader
BENCH_FILTER=
Results will be posted here when complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid unnecessary buffer zero-fill in Snappy decompression

3 participants