fix: re-enable Comet `abs` #2595

hsiang-c · 2025-10-16T22:26:28Z

Which issue does this PR close?

Closes #1890
Partially closes #2314

Rationale for this change

Context is abs returns incorrect value in some cases #666
The original abs implementation returns unexpected result and was turned off.

What changes are included in this PR?

Implemented Spark's ANSI mode that throws org.apache.spark.SparkArithmeticException on the MIN_VALUE of Spark's IntegralType, see doc.
In CometTestBase, changed the types of column _9, _10, _11 and _12 from UINT_8/16/32/64 to INT_8/16/32/64 b/c we actually have negative values in test data.

How are these changes tested?

unit tests w/ MIN_VALUE and decimal values with different precision and scale.
SparkSQL tests

codecov-commenter · 2025-10-17T01:09:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 59.33%. Comparing base (f09f8af) to head (1ddff98).
⚠️ Report is 621 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2595      +/-   ##
============================================
+ Coverage     56.12%   59.33%   +3.20%     
- Complexity      976     1444     +468     
============================================
  Files           119      146      +27     
  Lines         11743    13758    +2015     
  Branches       2251     2353     +102     
============================================
+ Hits           6591     8163    +1572     
- Misses         4012     4373     +361     
- Partials       1140     1222      +82

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

hsiang-c · 2025-10-17T20:13:57Z

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

-          Seq(2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 17).foreach { col =>
-            checkSparkAnswerAndOperator(s"SELECT abs(_${col}) FROM tbl")
+  test("abs") {
+    Seq(true, false).foreach { ansi_enabled =>


This is the diff, test with ANSI mode on/off.

hsiang-c · 2025-10-17T20:31:08Z

spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala

-       |  optional int32                    _10(UINT_16);
-       |  optional int32                    _11(UINT_32);
-       |  optional int64                    _12(UINT_64);
+       |  optional int32                    _9(INT_8);


We store negative values in these columns, I think the schema should not be unsigned int.

// CometTestBase.scala record.add(8, (-i).toByte) record.add(9, (-i).toShort) record.add(10, -i) record.add(11, (-i).toLong)

We have UINT here to make sure we cover all types that parquet has. The data files created here are specifically designed to test whether parquet readers can handle all types correctly. Negative values stored in a UINT parquet type test the values around the boundary of allowed values.
To illustrate with an example, when you store the value -1 in a UINT_8 field what gets stored is the bit pattern 0xff. On reading, this is read back as the value 255 which is the maximum value for a UINT_8.
This is both correct and desirable.

I commented in https://github.com/apache/datafusion-comet/pull/2595/files#r2445168046 but I don't think that we should continue writing new tests that rely on makeParquetFileAllPrimitiveTypes because it doesn't explicitly generate edge case values.

comphead · 2025-10-17T21:13:48Z

Thanks @hsiang-c WDYT of implementing abs with spark flavor in DF? Like I did recently for concat apache/datafusion#18128

andygrove · 2025-10-20T14:20:15Z

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

-        withParquetTable(path.toString, "tbl") {
-          Seq(2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 15, 16, 17).foreach { col =>
-            checkSparkAnswerAndOperator(s"SELECT abs(_${col}) FROM tbl")
+  test("abs") {


I'd prefer to see this test added to CometFuzzMathSuite because the generated Parquet file there tests for specific edge cases such as negative floating point zero and min/max values for all types. The data generated by makeParquetFileAllPrimitiveTypes does not explicitly add these edge cases.

andygrove · 2025-10-20T14:21:57Z

spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala

+  test("abs") {
+    Seq(true, false).foreach { ansi_enabled =>
+      Seq(true, false).foreach { dictionaryEnabled =>
+        withSQLConf(SQLConf.ANSI_ENABLED.key -> ansi_enabled.toString) {


The test isn't currently testing any scenarios where ANSI mode would cause an exception to be thrown.

martin-g · 2025-11-04T14:22:46Z

native/spark-expr/src/math_funcs/abs.rs

+                        }
+                    }
+                })
+                .unwrap(),


What should be the behavior here if the argument is ColumnarValue::Scalar(ScalarValue::Int8(None)) ?
Because currently the .unwrap() would panic.
IMO it should just return ColumnarValue::Scalar(ScalarValue::Int8(None))

Thanks @martin-g. I have addressed this in the replacement PR #2689

andygrove · 2025-11-05T02:41:35Z

This PR is superceded by #2689

Thanks @hsiang-c for getting it this far!

…mode (#18205) ## Which issue does this PR close?  - Part of #15914 ## Rationale for this change  - Apache Spark's `abs()` behaves differently than DataFusion. - Apache Spark's [ANSI-compliant](https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#ansi-compliance) dialect can be toggled by SparkConf `spark.sql.ansi.enabled`. When ANSI mode is off, arithmetic overflow doesn't throw exception like DataFusion does. - DataFusion Comet can leverage it at apache/datafusion-comet#2595 ## What changes are included in this PR?  - This is the 1st PR to support non-ANSI mode Spark-compatible `abs` math function - Mimics Apache Spark `v4.0.1` [abs expression](https://github.com/apache/spark/blob/v4.0.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L148) for numeric types only and non-ANSI mode, i.e. `spark.sql.ansi.enabled=false` ### Tasks breakdown | Non-ANSI mode | ANSI mode | ANSI Interval Types | | - | - | - | | this PR | hsiang-c#1 (will change base branch) | TODO | ## Are these changes tested?  - unit tests - sqllogictest: `test_files/spark/math/abs.slt` ## Are there any user-facing changes?  Yes, the abs function can be specified in the SQL. - Arithmetic overflow will NOT be thrown on arithmetic overflow.  --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>

…mode (apache#18205) ## Which issue does this PR close?  - Part of apache#15914 ## Rationale for this change  - Apache Spark's `abs()` behaves differently than DataFusion. - Apache Spark's [ANSI-compliant](https://spark.apache.org/docs/latest/sql-ref-ansi-compliance.html#ansi-compliance) dialect can be toggled by SparkConf `spark.sql.ansi.enabled`. When ANSI mode is off, arithmetic overflow doesn't throw exception like DataFusion does. - DataFusion Comet can leverage it at apache/datafusion-comet#2595 ## What changes are included in this PR?  - This is the 1st PR to support non-ANSI mode Spark-compatible `abs` math function - Mimics Apache Spark `v4.0.1` [abs expression](https://github.com/apache/spark/blob/v4.0.1/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L148) for numeric types only and non-ANSI mode, i.e. `spark.sql.ansi.enabled=false` ### Tasks breakdown | Non-ANSI mode | ANSI mode | ANSI Interval Types | | - | - | - | | this PR | hsiang-c#1 (will change base branch) | TODO | ## Are these changes tested?  - unit tests - sqllogictest: `test_files/spark/math/abs.slt` ## Are there any user-facing changes?  Yes, the abs function can be specified in the SQL. - Arithmetic overflow will NOT be thrown on arithmetic overflow.  --------- Co-authored-by: Oleks V <comphead@users.noreply.github.com>

hsiang-c added 2 commits October 16, 2025 14:32

Removed old ABS implementation

58d8c9a

Define Comet's ABS in Scala

d5b70c3

hsiang-c force-pushed the enable_abs branch from c163a77 to 0212d8b Compare October 17, 2025 20:07

hsiang-c commented Oct 17, 2025

View reviewed changes

hsiang-c force-pushed the enable_abs branch from 0212d8b to f93a55f Compare October 17, 2025 20:18

hsiang-c added 4 commits October 17, 2025 13:25

Implement Comet's ABS in rust

7b1965a

Enable ABS tests in legacy/ANSI mode

eb94cb6

Fix bit position b/c schema change in CometTestBase

ce1c1c2

Update docs

9cb2f6c

hsiang-c force-pushed the enable_abs branch from f93a55f to 9cb2f6c Compare October 17, 2025 20:25

hsiang-c commented Oct 17, 2025

View reviewed changes

hsiang-c marked this pull request as ready for review October 17, 2025 20:31

hsiang-c added 2 commits October 17, 2025 13:32

Fix style

6016f69

Update doc

1ddff98

andygrove reviewed Oct 20, 2025

View reviewed changes

hsiang-c mentioned this pull request Oct 21, 2025

feat: support Spark-compatible abs math function part 1 - non-ANSI mode apache/datafusion#18205

Merged

martin-g reviewed Nov 4, 2025

View reviewed changes

andygrove mentioned this pull request Nov 5, 2025

feat: Add support for abs #2689

Merged

andygrove closed this Nov 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: re-enable Comet `abs` #2595

fix: re-enable Comet `abs` #2595

Uh oh!

hsiang-c commented Oct 16, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 17, 2025 •

edited

Loading

Uh oh!

hsiang-c Oct 17, 2025

Uh oh!

hsiang-c Oct 17, 2025

Uh oh!

parthchandra Oct 17, 2025

Uh oh!

andygrove Oct 20, 2025 •

edited

Loading

Uh oh!

comphead commented Oct 17, 2025

Uh oh!

andygrove Oct 20, 2025

Uh oh!

andygrove Oct 20, 2025

Uh oh!

martin-g Nov 4, 2025

Uh oh!

andygrove Nov 5, 2025

Uh oh!

andygrove commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix: re-enable Comet abs #2595

fix: re-enable Comet abs #2595

Uh oh!

Conversation

hsiang-c commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

hsiang-c Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

hsiang-c Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

comphead commented Oct 17, 2025

Uh oh!

andygrove Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

martin-g Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

fix: re-enable Comet `abs` #2595

fix: re-enable Comet `abs` #2595

hsiang-c commented Oct 16, 2025 •

edited

Loading

codecov-commenter commented Oct 17, 2025 •

edited

Loading

andygrove Oct 20, 2025 •

edited

Loading