GH-47895: [C++][Parquet] Add prolog and epilog in unpack #47896

AntoinePrv · 2025-10-21T15:01:24Z

Rationale for this change

Simplify the use of unpack
Reduce code spread for unpacking integers

What changes are included in this PR?

epilog: unpack extract exactly the required number of values -> change return type to void.
prolog: unpack can handled non aligned data -> include bit_offset in input parameters.
Include prolog/epilog cases in unpack tests.
Simplify a roundtrip test from packed -> unpacked -> packed to unpacked -> packed -> unpacked

Decoder benchmarks should remain the same (tested linux x86-64).
I have not benchmark the unpack functions themselves but I don't believe it's relevant since they now do more work.

Are these changes tested?

Yes

Are there any user-facing changes?

No

GitHub Issue: [C++][Parquet] Unpack function epilog #47895

github-actions · 2025-10-21T15:01:58Z

⚠️ GitHub issue #47895 has been automatically assigned in GitHub to PR creator.

AntoinePrv · 2025-10-21T15:40:09Z

@pitrou this is ready for review (waiting for CI to finish here).

With this we could also investigate removing the BitReader from the BitPackedRunDecoder, reducing the general complexity seen by the compilers (number of member variables, pointers and offsets bookkeeping...).

pitrou · 2025-10-21T15:42:44Z

There's a sanitizer failure that needs fixing here:
https://github.com/apache/arrow/actions/runs/18688262707/job/53286800716?pr=47896#step:7:8798

(I suppose it happens when length == 0...)

pitrou · 2025-10-21T15:50:03Z

With this we could also investigate removing the BitReader from the BitPackedRunDecoder, reducing the general complexity seen by the compilers (number of member variables, pointers and offsets bookkeeping...).

Perhaps that can even be done in this PR? It doesn't sound very complicated...

pitrou · 2025-10-21T15:50:30Z

@ursabot please benchmark lang=C++

voltrondatabot · 2025-10-21T15:50:37Z

Benchmark runs are scheduled for commit 600696c. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

conbench-apache-arrow · 2025-10-21T17:33:52Z

Thanks for your patience. Conbench analyzed the 2 benchmarking runs that have been run so far on PR commit 600696c.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

pitrou · 2025-10-22T09:53:32Z

@ursabot please benchmark lang=C++

voltrondatabot · 2025-10-22T09:53:38Z

Benchmark runs are scheduled for commit 287f136. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

conbench-apache-arrow · 2025-10-22T12:50:53Z

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 287f136.

There were 7 benchmark results indicating a performance regression:

Pull Request Run on arm64-t4g-2xlarge-linux at 2025-10-22 11:18:34Z
- ReadMmapCachedFileAsync (C++) with params=num_cols:512/is_partial:0/real_time, source=cpp-micro, suite=arrow-ipc-read-write-benchmark
- ReadMmapCachedFile (C++) with params=num_cols:512/is_partial:0/real_time, source=cpp-micro, suite=arrow-ipc-read-write-benchmark
and 5 more (see the report linked below)

The full Conbench report has more details.

pitrou

Some comments on the implementation. I haven't looked at the bpacking tests.

cpp/src/arrow/util/bit_util.h

cpp/src/arrow/util/bpacking_dispatch_internal.h

pitrou · 2025-10-22T14:56:43Z

cpp/src/arrow/util/bpacking_dispatch_internal.h

+    const int spread = byte_end - byte_start + 1;
+    max = spread > max ? spread : max;
+    start += width;
+  } while (start % 8 != bit_offset);


Note that this will be an infinite loop if bit_offset >= 8 (hence the DCHECK suggestion below)

Indeed, though that function is never used at runtime, only compile time.

cpp/src/arrow/util/bpacking_dispatch_internal.h

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

AntoinePrv · 2025-10-23T09:17:55Z

@pitrou removing the MaxSpread constexpr logic did not perform well. Up to -20% on decoding benchmarks.

For reference: that MaxSpread metric is central to the shuffle SIMD algorithm I'm working on:

If small: we can "pack" multiple values in the shuffle and reuse it with multiple rshifts
If very large: we have to do something radically different for packed values that spread over >8 bytes

pitrou · 2025-10-23T09:19:32Z

Ah, sorry! Let's just restore it then :)

AntoinePrv · 2025-10-23T09:34:54Z

I never pushed it, this was on a local benchmark.

pitrou · 2025-10-28T15:42:19Z

cpp/src/arrow/util/bpacking_dispatch_internal.h

+    // Easy case to handle, simply setting memory to zero.
+    return unpack_null(in, out, batch_size);
+  } else {
+    // In case of misalignment, we need to run the prolog until aligned.


As a TODO, if batch_size is large enough, we can perhaps rewind to the last byte-aligned packed and SIMD-unpack kValuesUnpacked into a local buffer, instead of going through unpack_exact.

(this seems lower-priority than SIMD shuffling, though)

pitrou · 2025-10-28T15:43:16Z

cpp/src/arrow/util/bpacking_dispatch_internal.h

+      ARROW_DCHECK_GE(batch_size, 0);
+      ARROW_COMPILER_ASSUME(batch_size < kValuesUnpacked);
+      ARROW_COMPILER_ASSUME(batch_size >= 0);
+      unpack_exact<kPackedBitWidth, false>(in, out, batch_size, /* bit_offset= */ 0);


Similarly, if there's enough padding at the end of the input, we could SIMD-unpack a full kValuesUnpacked into a local buffer.

pitrou · 2025-10-28T15:47:49Z

cpp/src/arrow/util/bpacking_dispatch_internal.h

    switch (num_bits) {
      case 0:
-        return unpack_null(in, out, batch_size);
+        return unpack_width<0, Unpacker>(in, out, batch_size, bit_offset);


Ok, macros are not pretty, but we could have a macro here to minimize diffs when changing these function signatures :-)

Such as:

#define CASE_UNPACK_WIDTH(_width) \ return unpack_width<_width, Unpacker>(in, out, batch_size, bit_offset) if constexpr (std::is_same_v<UnpackedUint, bool>) { switch (num_bits) { case 0: CASE_UNPACK_WIDTH(0); // etc. #undef CASE_UNPACK_WIDTH

As you prefer, though.

pitrou · 2025-10-28T15:49:55Z

cpp/src/arrow/util/bpacking_test.cc

+  if constexpr (std::is_same_v<Uint, bool>) {
+    random_is_valid(num_values, 0.5, &out, kSeed);
+  } else {
+    const uint64_t max = (uint64_t{1} << (static_cast<uint64_t>(bit_width) - 1)) - 1;


Shouldn't this be

Suggested change

const uint64_t max = (uint64_t{1} << (static_cast<uint64_t>(bit_width) - 1)) - 1;

const uint64_t max = (uint64_t{1} << static_cast<uint64_t>(bit_width)) - 1;

(e.g. you want 2**10 - 1 for 10-bit packing, not 2**9 - 1)

pitrou · 2025-10-28T15:51:46Z

cpp/src/arrow/util/bpacking_test.cc

+  if (!written) {
+    throw std::runtime_error("Cannot write move values");
  }


Let's avoid exceptions, you could for example have this function return a Result<std::vector<uint8_t>>.

AntoinePrv added 7 commits October 21, 2025 16:25

Add unpack epilogue

0f42ace

Try smaller integer sizes

60bdf28

void return type

5c0e3ca

Adapt unpack_epilog for prolog

5115b09

Add bit_offset parameter to unpack functions

a520dd8

Add unpack prolog

d8857ad

Simplify test roundtrip logic

600696c

github-actions bot added Component: C++ awaiting review Awaiting review labels Oct 21, 2025

AntoinePrv changed the title ~~GH-47895: [C++][Parquet] Add prolog and eiplog in unpack~~ GH-47895: [C++][Parquet] Add prolog and epilog in unpack Oct 21, 2025

AntoinePrv added 2 commits October 22, 2025 10:01

Fix ASAN test error

1efc260

Remove BitReader from BitPackedRunDecoder

287f136

pitrou requested changes Oct 22, 2025

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 22, 2025

pitrou reviewed Oct 22, 2025

View reviewed changes

cpp/src/arrow/util/bpacking_dispatch_internal.h Show resolved Hide resolved

AntoinePrv and others added 2 commits October 23, 2025 09:38

Check bit_offset size

04861b2

Co-authored-by: Antoine Pitrou <pitrou@free.fr>

Merge mask functions

0e0d650

pitrou reviewed Oct 28, 2025

View reviewed changes

	const uint64_t max = (uint64_t{1} << (static_cast<uint64_t>(bit_width) - 1)) - 1;
	const uint64_t max = (uint64_t{1} << static_cast<uint64_t>(bit_width)) - 1;

Uh oh!

GH-47895: [C++][Parquet] Add prolog and epilog in unpack #47896

Are you sure you want to change the base?

GH-47895: [C++][Parquet] Add prolog and epilog in unpack #47896

Conversation

AntoinePrv commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

AntoinePrv commented Oct 21, 2025

Uh oh!

pitrou commented Oct 21, 2025

Uh oh!

pitrou commented Oct 21, 2025

Uh oh!

pitrou commented Oct 21, 2025

Uh oh!

voltrondatabot commented Oct 21, 2025

Uh oh!

conbench-apache-arrow bot commented Oct 21, 2025

Uh oh!

pitrou commented Oct 22, 2025

Uh oh!

voltrondatabot commented Oct 22, 2025

Uh oh!

conbench-apache-arrow bot commented Oct 22, 2025

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pitrou Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

AntoinePrv Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AntoinePrv commented Oct 23, 2025

Uh oh!

pitrou commented Oct 23, 2025

Uh oh!

AntoinePrv commented Oct 23, 2025

Uh oh!

pitrou Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

pitrou Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

pitrou Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

pitrou Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

pitrou Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AntoinePrv commented Oct 21, 2025 •

edited

Loading