Skip to content

Conversation

@ohadzeliger
Copy link
Contributor

Lucene partitioner is not thread safe when records are updated concurrently.
The partition count and boundaries are not thread safe and can skew in multi threaded updates.
The changes include protecting the partition metadata changes with a keyspace lock.
The LuceneIndexMaintenanceTest.concurrent* tests have been extended to run in partitioned setup and are now passing.
In addition, a bug where empty partitions were not removed was uncovered, and a fix was introduced, with accompanying tests. Empty partitions (first, middle and last) are removed with their index data, ensuring at least one partition is always left behind.

Resolves #2990

ScottDugas and others added 4 commits October 23, 2025 15:28
This changes the concurrentUpdate to run with a single partition,
and adds usages of AsyncLock to ensure that updates to the partition
metadata work correctly.
Before calling the issue complete an additional test should be added
that does concurrent inserts, and one for concurrent deletes.
@ohadzeliger ohadzeliger self-assigned this Oct 28, 2025
@ohadzeliger ohadzeliger added the bug fix Change that fixes a bug label Oct 28, 2025
ohadzeliger and others added 2 commits November 4, 2025 13:28
…ord/RecordCoreInternalException.java

Co-authored-by: Scott Dugas <scott.dugas@gmail.com>
Code restructuring to makemore readable
Verify that partition is empty before removal
Added tests
assertThat(partitionCounts, Matchers.contains(5, 3, 4)));
}

static Stream<Arguments> removeEmptyPartitions() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the new assertions there are good, but that is different from the test I was describing.
concurrentDelete only does one merge, after deleting all the records, and I was suggesting a loop:

while (records.size() > 0) {
    delete(random.nextInt(records.size()));
    commit
    merge
    validate
}

This will give better coverage of the relationship between merge and various partial states, as opposed to all partitions being gone.

} else {
timerSnapshot = null;
}
final StoreTimerSnapshot timerSnapshot = getStoreTimerSnapshot();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should mark this teamscale warning as tolerated.
The line at which we call getStoreTimerSnapshot is important, and calling it later would not give us timing information.

ohadzeliger and others added 10 commits November 6, 2025 16:51
added test to randomly remove records and assert empty partitions.
This changes the concurrentUpdate to run with a single partition,
and adds usages of AsyncLock to ensure that updates to the partition
metadata work correctly.
Before calling the issue complete an additional test should be added
that does concurrent inserts, and one for concurrent deletes.
…ord/RecordCoreInternalException.java

Co-authored-by: Scott Dugas <scott.dugas@gmail.com>
Code restructuring to makemore readable
Verify that partition is empty before removal
Added tests
added test to randomly remove records and assert empty partitions.
…read-safe-partitioner

# Conflicts:
#	fdb-record-layer-lucene/src/test/java/com/apple/foundationdb/record/lucene/LuceneIndexMaintenanceTest.java
@github-actions
Copy link

github-actions bot commented Nov 7, 2025

📊 Metrics Diff Analysis Report

Summary

  • New queries: 0
  • Dropped queries: 0
  • Plan changed + metrics changed: 0
  • Plan unchanged + metrics changed: 0
ℹ️ About this analysis

This automated analysis compares query planner metrics between the base branch and this PR. It categorizes changes into:

  • New queries: Queries added in this PR
  • Dropped queries: Queries removed in this PR. These should be reviewed to ensure we are not losing coverage.
  • Plan changed + metrics changed: The query plan has changed along with planner metrics.
  • Metrics only changed: Same plan but different metrics

The last category in particular may indicate planner regressions that should be investigated.

@Tag(Tags.RequiresFDB)
public class LuceneRepartitionPlannerTest {

private static Stream<Arguments> luceneRepartitionPlannerTest() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this can be private and junit can handle it, but it seems like it can.

@ohadzeliger ohadzeliger merged commit f032a91 into FoundationDB:main Nov 7, 2025
8 checks passed
@ohadzeliger ohadzeliger deleted the thread-safe-partitioner branch November 7, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix Change that fixes a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LucenePartitioner is not thread safe if multiple independent records are updated concurrently

2 participants