Skip to content

Conversation

@JaroslavTulach
Copy link
Member

@JaroslavTulach JaroslavTulach commented Dec 1, 2025

Pull Request Description

Checklist

Please ensure that the following checklist has been satisfied before submitting the PR:

@JaroslavTulach JaroslavTulach self-assigned this Dec 1, 2025
@JaroslavTulach JaroslavTulach added the CI: No changelog needed Do not require a changelog entry for this PR. label Dec 1, 2025
@JaroslavTulach JaroslavTulach added this to the 2026.1 Release milestone Dec 2, 2025
@JaroslavTulach JaroslavTulach moved this to 🔧 Implementation in Issues Board Dec 2, 2025
Copy link
Member

@Akirathan Akirathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@JaroslavTulach
Copy link
Member Author

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Dec 2, 2025

Down to 27 ❌ failed tests. Still a lot of failures like:

    - [FAILED] should be able to replace multiple columns selected by type [19ms]
        Reason: An unexpected panic was thrown: java.lang.AssertionError: Null values found in a column, so a null key should be found
        at <Unknown Location> related to com.oracle.truffle.host.HostObject.doInvoke
        at <enso> In_Memory_Table_Implementation.merge.handle_java_errors(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Table_Implementation.enso:538:30-129)
        at <enso> Java_Problems.with_problem_aggregator(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Java_Problems.enso:118:14-25)
        at <enso> In_Memory_Table_Implementation.merge<arg-1>(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Table_Implementation.enso:537-539)
        at <enso> In_Memory_Table_Implementation.merge.handle_java_errors<arg-1>(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Table_Implementation.enso:535:25-30)
        at <enso> Panic.catch(Internal)
        at <enso> In_Memory_Table_Implementation.merge.handle_java_errors<arg-1>(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Table_Implementation.enso:534-535)
        at <enso> Panic.catch(Internal)
        at <enso> In_Memory_Table_Implementation.merge.handle_java_errors<arg-1>(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Table_Implementation.enso:533-535)
        at <enso> Panic.catch(Internal)
        at <enso> In_Memory_Table_Implementation.merge.handle_java_errors(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Table_Implementation.enso:532-535)
        at <enso> In_Memory_Table_Implementation.merge(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Table_Implementation.enso:536-539)
        at <enso> Table.merge(distribution/lib/Standard/Table/0.0.0-dev/src/Table.enso:3052:9-123)
        at <enso> case_branch<arg-1>(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Replace_Helpers.enso:60:34-199)
        at <enso> Any.if_not_error(distribution/lib/Standard/Base/0.0.0-dev/src/Error.enso:364:32-36)
        at <enso> case_branch<arg-1>(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Replace_Helpers.enso:42-62)
        at <enso> Any.if_not_error(distribution/lib/Standard/Base/0.0.0-dev/src/Error.enso:364:32-36)
        at <enso> case_branch(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Replace_Helpers.enso:42-62)
        at <enso> case_branch(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Replace_Helpers.enso:23-62)
        at <enso> Replace_Helpers.replace(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Replace_Helpers.enso:16-62)
        at <enso> case_branch.replace_columns(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Replace_Helpers.enso:20:69-166)
        at <enso> Array_Like_Helpers.fold.f(distribution/lib/Standard/Base/0.0.0-dev/src/Internal/Array_Like_Helpers.enso:286:22-48)
        at <enso> if_then_else(distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso:399:31-50)
        at <enso> if_then_else.go(distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso:398-400)
        at <enso> if_then_else(distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso:401:13-30)
        at <enso> Range.fold(distribution/lib/Standard/Base/0.0.0-dev/src/Data/Range.enso:395-401)
        at <enso> Array_Like_Helpers.fold(distribution/lib/Standard/Base/0.0.0-dev/src/Internal/Array_Like_Helpers.enso:287:5-39)
        at <enso> Vector.fold(distribution/lib/Standard/Base/0.0.0-dev/src/Data/Vector.enso:372:9-50)
        at <enso> case_branch(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Replace_Helpers.enso:20:13-167)
        at <enso> Replace_Helpers.replace(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/Replace_Helpers.enso:16-62)
        at <enso> In_Memory_Table_Implementation.replace(distribution/lib/Standard/Table/0.0.0-dev/src/Internal/In_Memory_Table_Implementation.enso:542:9-118)
        at <enso> Table.replace(distribution/lib/Standard/Table/0.0.0-dev/src/Table.enso:3138:9-127)
        at <enso> Replace_Spec.add_replace_specs.prefix<arg-0>(test/Table_Tests/src/Common_Table_Operations/Join/Replace_Spec.enso:86:22-77)
        at <enso> Replace_Spec.add_replace_specs.prefix<arg-1>(test/Table_Tests/src/Common_Table_Operations/Join/Replace_Spec.enso:86:22-95)

@JaroslavTulach
Copy link
Member Author

Down to 2 failures:

sbt:enso> runEngineDistribution --run test/Table_Tests
5531 tests succeeded.
2 tests failed.
118 tests skipped.
34 groups skipped.

Failed tests: 'will.fail.if.key.columns.of.the.lookup.table.contain.Nothing|should.fail.on.null.key.values.in.lookup.table'

one failure is at

    - [FAILED] will fail if key columns of the lookup table contain Nothing [70ms]
        Reason: Expected an error Null_Values_In_Key_Columns but no error occurred, instead got: (Table.Value (In_Memory_Table.Value org.enso.table.data.table.Table@acc8cbf) In_Memory_Table_Implementation) (at test/Table_Tests/src/Common_Table_Operations/Join/Lookup_Spec.enso:312:13-58).

and second failure is at

[FAILED] [In-Memory] Table.replace: [21/22, 510ms]                    
  | => enso / runEngineDistribution 347s
    - [FAILED] should fail on null key values in lookup table [21ms]
        Reason: Expected an error Null_Values_In_Key_Columns but no error occurred, instead got: (Table.Value (In_Memory_Table.Value org.enso.table.data.table.Table@4403ddef) In_Memory_Table_Implementation) (at test/Table_Tests/src/Common_Table_Operations/Join/Replace_Spec.enso:136:13-88).

JaroslavTulach and others added 3 commits December 2, 2025 17:54
Co-authored-by: Pavel Marek <pavel.marek@enso.org>
Co-authored-by: Pavel Marek <pavel.marek@enso.org>
@JaroslavTulach JaroslavTulach changed the title Using validity bit map as arrow does Using validity bitmap as arrow does Dec 2, 2025
@enso-bot
Copy link

enso-bot bot commented Dec 3, 2025

Jaroslav Tulach reports a new STANDUP for yesterday (2025-12-02):

Progress: .

@JaroslavTulach
Copy link
Member Author

  • org_enso_benchmarks_generated_Column_Arithmetic_1000000_Multiply_Constant is 2x slower
Column_Arithmetic_1000000_Multiply_Constant
  • org_enso_benchmarks_generated_Column_Arithmetic_1000000_Plus_Constant is 2x slower
obrazek
  • org_enso_benchmarks_generated_Column_Arithmetic_1000000_Plus_Overflowing also 2x slower
  • org_enso_benchmarks_generated_Column_Cast_1000000_Float_To_Integer is up to 2x slower
obrazek
  • org_enso_benchmarks_generated_Column_Cast_1000000_Integer_To_Float is up to 2x slower
  • org_enso_benchmarks_generated_Column_Cast_1000000_Integer_To_Smaller_Integer_Fitting is up to 2x slower
  • org_enso_benchmarks_generated_Column_Cast_1000000_Integer_To_Text few percent slower
Int2Text
  • many org_enso_benchmarks_generated_Column_Comparisons are slightly slower
column comparision

Copy link
Member

@jdunkerley jdunkerley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - a couple of small fixes please

@github-project-automation github-project-automation bot moved this from 🔧 Implementation to 🌟 Q/A review in Issues Board Dec 3, 2025
Copy link
Collaborator

@hubertp hubertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If increased memory is an issue then why not use something like RoaringBitmap?

Comment on lines 302 to 316
if (keepValue) {
var newIsNothing =
boolStorage.isNegated() ? or(isNothing, values) : orNot(isNothing, values, checkedSize);
boolStorage.isNegated()
? or(isNothing, values, checkedSize)
: orNot(isNothing, values, checkedSize);
newIsNothing.flip(0, checkedSize);
return new BoolStorage(values, newIsNothing, checkedSize, boolStorage.isNegated());
} else {
var newIsNothing =
boolStorage.isNegated() ? orNot(isNothing, values, checkedSize) : or(isNothing, values);
boolStorage.isNegated()
? orNot(isNothing, values, checkedSize)
: or(isNothing, values, checkedSize);
newIsNothing.flip(0, checkedSize);
return new BoolStorage(values, newIsNothing, checkedSize, !boolStorage.isNegated());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY (although it has been like that before)

Copy link
Member Author

@JaroslavTulach JaroslavTulach Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I had touched the code in previous commits
  • and then I greatly regretted that
  • in bfa1c62 I had to return the original code back
  • it is probably more important to keep the semantics correct
  • than to fulfill any artificial code metrics
  • but the current version is also slower (because of calling flip two times)
  • thus I may give it one more try...

@JaroslavTulach
Copy link
Member Author

JaroslavTulach commented Dec 3, 2025

If increased memory is an issue then why not use something like RoaringBitmap?

  • the primary goal here is to adhere to Arrow format
  • if the increased memory usage is a necessary to be Arrow compatible, then @jdunkerley agreed to to pay the price
  • if there is a way to be Arrow compatible and avoid the overhead
    • 1M of LongStorage non-null long items used to take 8MB of memory
    • now we need additional 1/8MB for BitSet full of 1
  • then I happily implement such an improvement

Validity bitmaps

Any value in an array may be semantically null, whether primitive or nested type.

All array types, with the exception of union types (more on these later), utilize a dedicated memory buffer, known as the validity (or “null”) bitmap, to encode the nullness or non-nullness of each value slot. The validity bitmap must be large enough to have at least 1 bit for each array slot.

  • reading this section from arrow columnar spec doesn't leave much hope we could do without the 125KB of 0xff

@JaroslavTulach JaroslavTulach changed the title Using validity bitmap as arrow does Using validity bitmap as Arrow does Dec 3, 2025
@JaroslavTulach JaroslavTulach force-pushed the wip/jtulach/ValidityBitMap13851 branch from a9592b4 to 0c07d60 Compare December 3, 2025 18:30
@enso-bot
Copy link

enso-bot bot commented Dec 4, 2025

Jaroslav Tulach reports a new STANDUP for yesterday (2025-12-03):

Progress: .

@JaroslavTulach
Copy link
Member Author

  • org_enso_benchmarks_generated_Column_Arithmetic_1000000_Multiply_Constant is 2x slower
  • with recent changes the benchmarks got faster
Column_Arithmetic_1000000_Multiply_Constant Plus_Constant cast
  • otherwise we are green
  • let's integrate and move towards Arrow

@JaroslavTulach JaroslavTulach merged commit e03ad84 into develop Dec 4, 2025
78 of 80 checks passed
@JaroslavTulach JaroslavTulach deleted the wip/jtulach/ValidityBitMap13851 branch December 4, 2025 05:10
@github-project-automation github-project-automation bot moved this from 🌟 Q/A review to 🟢 Accepted in Issues Board Dec 4, 2025
@enso-bot
Copy link

enso-bot bot commented Dec 5, 2025

Jaroslav Tulach reports a new STANDUP for yesterday (2025-12-04):

Progress: .

@enso-bot
Copy link

enso-bot bot commented Dec 6, 2025

Jaroslav Tulach reports a new STANDUP for yesterday (2025-12-05):

Progress: .

GitHub
Pull Request Description

#14357 has introduced a "mean benchmark"

that benchmark was slow in dual JVM mode
the goal of this pull request is to speed it up

the idea is to reuse work do...

GitHub
Pull Request Description

#14357 has introduced a "mean benchmark"

that benchmark was slow in dual JVM mode
the goal of this pull request is to speed it up

the idea is to reuse work do...

GitHub
Pull Request Description

#14357 has introduced a "mean benchmark"

that benchmark was slow in dual JVM mode
the goal of this pull request is to speed it up

the idea is to reuse work do...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

-compiler CI: No changelog needed Do not require a changelog entry for this PR.

Projects

Status: 🟢 Accepted

Development

Successfully merging this pull request may close these issues.

5 participants