-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47859: [C++] Fix creating union types without type_codes for fields.size() == 128 #47815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
Hi, @kou! Do you know someone who can review this PR? It looks like a bug rather than a "feature" |
raulcd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
This doesn't fit our definition of MINOR, could you please open an issue for this change? See our definition for MINOR:
https://github.com/apache/arrow/blob/main/CONTRIBUTING.md#minor-fixes
Thanks!
|
|
|
Hi, @kou! Can you take a look? |
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. It seems that we can't create a union type that has fields.size() == 128:
https://arrow.apache.org/docs/format/Columnar.html#dense-union
A union with more than 127 possible types can be modeled as a union of unions.
cpp/src/arrow/util/range.h
Outdated
| template <typename T> | ||
| std::vector<T> Iota(T start, size_t length) { | ||
| std::vector<T> result(length); | ||
| std::iota(result.begin(), result.end(), start); | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this in
arrow/cpp/src/arrow/util/range.h
Lines 36 to 39 in 8c16a08
| std::vector<T> result(static_cast<size_t>(stop - start)); | |
| std::iota(result.begin(), result.end(), start); | |
| return result; | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call this in
The stop border not included, so for fields.size() == 128, Iota<int8_t>(0, 128) is called and 128 is converted to -128.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to confirm whether we can do the following:
diff --git a/cpp/src/arrow/util/range.h b/cpp/src/arrow/util/range.h
index 2055328798..7ccf3d154d 100644
--- a/cpp/src/arrow/util/range.h
+++ b/cpp/src/arrow/util/range.h
@@ -33,9 +33,7 @@ std::vector<T> Iota(T start, T stop) {
if (start > stop) {
return {};
}
- std::vector<T> result(static_cast<size_t>(stop - start));
- std::iota(result.begin(), result.end(), start);
- return result;
+ return Iota(start, static_cast<size_t>(stop - start));
}
/// Create a vector containing the values from 0 up to length
There are no restrictions on creating a union type that has Could there be other limitations? Current designs allow this if you build a vector of type codes yourself. |
This check is for each type code value check. It's not a check for the number of type codes. @bkietz You added the following checks in #4781: The former change limits the number of type codes to 127 but the latter change limits the number of type codes to 128. Do you have any opinion for this report? @pitrou You may have an opinion for this report because you introduced |
|
|
@pitrou If the maximum number of type ids is 128, it is correct to create a union type with |
Yes, it should be. |
Should we update the "127" in the document to "128" too? |
Done, is it alright? |
|
@kou is there anything to do to complete this PR? |
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix this error?
https://github.com/apache/arrow/actions/runs/19468399277/job/56026880992?pr=47815#step:11:1701
FAILED: [code=2] src/arrow/CMakeFiles/arrow-type-test.dir/Unity/unity_0_cxx.cxx.obj
"C:\Program Files\Git\usr\bin\ccache.exe" C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1444~1.352\bin\Hostx64\x64\cl.exe /nologo /TP -DARROW_HAVE_AVX2 -DARROW_HAVE_BMI2 -DARROW_HAVE_RUNTIME_AVX2 -DARROW_HAVE_RUNTIME_AVX512 -DARROW_HAVE_RUNTIME_BMI2 -DARROW_HAVE_RUNTIME_SSE4_2 -DARROW_HAVE_SSE4_2 -DARROW_WITH_BENCHMARKS_REFERENCE -DARROW_WITH_TIMING_TESTS -D_CRT_SECURE_NO_WARNINGS -D__SSE2__ -D__SSE4_1__ -D__SSE4_2__ -ID:\a\arrow\arrow\build\cpp\src -ID:\a\arrow\arrow\cpp\src -ID:\a\arrow\arrow\cpp\src\generated -external:ID:\a\arrow\arrow\build\cpp\_deps\googletest-src\googlemock\include -external:ID:\a\arrow\arrow\build\cpp\_deps\googletest-src\googletest\include -external:ID:\a\arrow\arrow\cpp\thirdparty\flatbuffers\include -external:ID:\a\arrow\arrow\build\cpp\_deps\googletest-src\googletest -external:ID:\a\arrow\arrow\build\cpp\_deps\googletest-src\googlemock -external:W0 /D_SILENCE_TR1_NAMESPACE_DEPRECATION_WARNING /W3 /EHsc /wd5105 /bigobj /utf-8 /W3 /wd4365 /wd4267 /wd4838 /wd4800 /wd4996 /wd4065 /arch:AVX2 /Ob0 /Od /RTC1 /WX -std:c++17 -MDd -Zi /showIncludes /Fosrc\arrow\CMakeFiles\arrow-type-test.dir\Unity\unity_0_cxx.cxx.obj /Fdsrc\arrow\CMakeFiles\arrow-type-test.dir\ /FS -c D:\a\arrow\arrow\build\cpp\src\arrow\CMakeFiles\arrow-type-test.dir\Unity\unity_0_cxx.cxx
D:/a/arrow/arrow/cpp/src/arrow/type_test.cc(2201): error C2039: 'iota': is not a member of 'std'
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.44.35207\include\unordered_set(23): note: see declaration of 'std'
D:/a/arrow/arrow/cpp/src/arrow/type_test.cc(2201): error C3861: 'iota': identifier not found
This?
diff --git a/cpp/src/arrow/type_test.cc b/cpp/src/arrow/type_test.cc
index cb17f1ac3f..4c796ceee7 100644
--- a/cpp/src/arrow/type_test.cc
+++ b/cpp/src/arrow/type_test.cc
@@ -22,6 +22,7 @@
#include <cstdint>
#include <functional>
#include <memory>
+#include <numeric>
#include <string>
#include <unordered_set>
#include <vector>
cpp/src/arrow/util/range.h
Outdated
| template <typename T> | ||
| std::vector<T> Iota(T start, size_t length) { | ||
| std::vector<T> result(length); | ||
| std::iota(result.begin(), result.end(), start); | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to confirm whether we can do the following:
diff --git a/cpp/src/arrow/util/range.h b/cpp/src/arrow/util/range.h
index 2055328798..7ccf3d154d 100644
--- a/cpp/src/arrow/util/range.h
+++ b/cpp/src/arrow/util/range.h
@@ -33,9 +33,7 @@ std::vector<T> Iota(T start, T stop) {
if (start > stop) {
return {};
}
- std::vector<T> result(static_cast<size_t>(stop - start));
- std::iota(result.begin(), result.end(), start);
- return result;
+ return Iota(start, static_cast<size_t>(stop - start));
}
/// Create a vector containing the values from 0 up to length
FIxed, can you approve workflows? |
|
Could you rebase on main to confirm whether the CI failures are unrelated to this change? |
OK, can you approve workflows? |
kou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Failures are unrelated: #48230
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 55587ef. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 1 possible false positive for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
There is a bug for creating union types with empty
type_codes. Iffields.size()== 128 (kMaxTypeCode+ 1) andtype_codesis empty, static_cast<int8_t> returns -128 andinternal::Iotagenerates an empty vector of type codes, but the expected vector is [0, 1, 2, ..., 127], where 127 iskMaxTypeCode.What changes are included in this PR?
internal::Iotafunction to generate vectors of size =lengthwith values fromstart.internal::Iotacall from old parameters to new for creatingdense_unionandsparse_uniontypes.Are these changes tested?
Yes, there is a new test.
Are there any user-facing changes?
No.
This PR contains a "Critical Fix".
(b) a bug that caused incorrect or invalid data to be produced
If
fields.size()== 128 (kMaxTypeCode+ 1),internal::Iotareturns an empty vector that cannot validate here:https://github.com/apache/arrow/blob/main/cpp/src/arrow/type.cc#L1232