Skip to content

Fix data loss by adding checkpoint after vector index creation#422

Merged
jfrench9 merged 1 commit intomainfrom
bugfix/checkpoint-vector-index
Mar 3, 2026
Merged

Fix data loss by adding checkpoint after vector index creation#422
jfrench9 merged 1 commit intomainfrom
bugfix/checkpoint-vector-index

Conversation

@jfrench9
Copy link
Member

@jfrench9 jfrench9 commented Mar 3, 2026

Summary

Adds a checkpoint step to the materialization process immediately after vector index creation to ensure the index is fully persisted to disk before proceeding. This fixes a bug where vector index data could be lost during subsequent database cleanup operations.

Key Accomplishments

  • Prevents vector index data loss: Without an explicit checkpoint after creating the vector index, the index existed only in memory/WAL and could be discarded when the database performed cleanup or compaction operations. The added checkpoint forces a flush to persistent storage.
  • Improves materialization reliability: Ensures the full materialization pipeline produces durable results at each critical stage, not just at the end of the process.

Root Cause

During the materialization flow, the vector index was being created but not immediately checkpointed to disk. If a database cleanup or other disk-syncing operation occurred afterward, the in-memory vector index data could be lost, effectively undoing that portion of the materialization work.

Breaking Changes

None. This is a purely additive fix that introduces a checkpoint call at the appropriate point in the existing materialization workflow.

Testing Notes

  • Verify that materialized tables with vector indexes retain their index data after the full materialization pipeline completes, including any cleanup phases.
  • Confirm that the additional checkpoint does not introduce significant performance regression for the materialization process.
  • Test with scenarios where database cleanup or compaction is triggered shortly after vector index creation to validate the fix addresses the original data loss condition.

Infrastructure Considerations

  • The additional checkpoint introduces a disk I/O synchronization point during materialization. For large datasets, this may slightly increase total materialization time but is necessary to guarantee data integrity.
  • No changes to deployment configuration or dependencies are required.

🤖 Generated with Claude Code

Branch Info:

  • Source: bugfix/checkpoint-vector-index
  • Target: main
  • Type: bugfix

Co-Authored-By: Claude noreply@anthropic.com

…ex creation to ensure persistence on disk. This prevents data loss during database cleanup.
@jfrench9 jfrench9 merged commit c1ebf49 into main Mar 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant