-
Couldn't load subscription status.
- Fork 3.9k
GH-47895: [C++][Parquet] Add prolog and epilog in unpack #47896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
|
@pitrou this is ready for review (waiting for CI to finish here). With this we could also investigate removing the |
|
There's a sanitizer failure that needs fixing here: (I suppose it happens when length == 0...) |
Perhaps that can even be done in this PR? It doesn't sound very complicated... |
|
@ursabot please benchmark lang=C++ |
|
Benchmark runs are scheduled for commit 600696c. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
|
Thanks for your patience. Conbench analyzed the 2 benchmarking runs that have been run so far on PR commit 600696c. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
|
@ursabot please benchmark lang=C++ |
|
Benchmark runs are scheduled for commit 287f136. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
|
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 287f136. There were 7 benchmark results indicating a performance regression:
The full Conbench report has more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on the implementation. I haven't looked at the bpacking tests.
| const int spread = byte_end - byte_start + 1; | ||
| max = spread > max ? spread : max; | ||
| start += width; | ||
| } while (start % 8 != bit_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this will be an infinite loop if bit_offset >= 8 (hence the DCHECK suggestion below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, though that function is never used at runtime, only compile time.
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
|
@pitrou removing the For reference: that
|
|
Ah, sorry! Let's just restore it then :) |
|
I never pushed it, this was on a local benchmark. |
| // Easy case to handle, simply setting memory to zero. | ||
| return unpack_null(in, out, batch_size); | ||
| } else { | ||
| // In case of misalignment, we need to run the prolog until aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a TODO, if batch_size is large enough, we can perhaps rewind to the last byte-aligned packed and SIMD-unpack kValuesUnpacked into a local buffer, instead of going through unpack_exact.
(this seems lower-priority than SIMD shuffling, though)
| ARROW_DCHECK_GE(batch_size, 0); | ||
| ARROW_COMPILER_ASSUME(batch_size < kValuesUnpacked); | ||
| ARROW_COMPILER_ASSUME(batch_size >= 0); | ||
| unpack_exact<kPackedBitWidth, false>(in, out, batch_size, /* bit_offset= */ 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, if there's enough padding at the end of the input, we could SIMD-unpack a full kValuesUnpacked into a local buffer.
| switch (num_bits) { | ||
| case 0: | ||
| return unpack_null(in, out, batch_size); | ||
| return unpack_width<0, Unpacker>(in, out, batch_size, bit_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, macros are not pretty, but we could have a macro here to minimize diffs when changing these function signatures :-)
Such as:
#define CASE_UNPACK_WIDTH(_width) \
return unpack_width<_width, Unpacker>(in, out, batch_size, bit_offset)
if constexpr (std::is_same_v<UnpackedUint, bool>) {
switch (num_bits) {
case 0:
CASE_UNPACK_WIDTH(0);
// etc.
#undef CASE_UNPACK_WIDTHAs you prefer, though.
| if constexpr (std::is_same_v<Uint, bool>) { | ||
| random_is_valid(num_values, 0.5, &out, kSeed); | ||
| } else { | ||
| const uint64_t max = (uint64_t{1} << (static_cast<uint64_t>(bit_width) - 1)) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
| const uint64_t max = (uint64_t{1} << (static_cast<uint64_t>(bit_width) - 1)) - 1; | |
| const uint64_t max = (uint64_t{1} << static_cast<uint64_t>(bit_width)) - 1; |
(e.g. you want 2**10 - 1 for 10-bit packing, not 2**9 - 1)
| if (!written) { | ||
| throw std::runtime_error("Cannot write move values"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid exceptions, you could for example have this function return a Result<std::vector<uint8_t>>.
Rationale for this change
unpackWhat changes are included in this PR?
unpackextract exactly the required number of values -> change return type tovoid.unpackcan handled non aligned data -> includebit_offsetin input parameters.unpacktests.Decoder benchmarks should remain the same (tested linux x86-64).
I have not benchmark the
unpackfunctions themselves but I don't believe it's relevant since they now do more work.Are these changes tested?
Yes
Are there any user-facing changes?
No