Skip to content

db_stress: fix prefix scan correctness and crashtest stability#14512

Open
xingbowang wants to merge 6 commits intofacebook:mainfrom
xingbowang:2026_03_25_stress_prefix_fix2
Open

db_stress: fix prefix scan correctness and crashtest stability#14512
xingbowang wants to merge 6 commits intofacebook:mainfrom
xingbowang:2026_03_25_stress_prefix_fix2

Conversation

@xingbowang
Copy link
Copy Markdown
Contributor

@xingbowang xingbowang commented Mar 26, 2026

Summary

This PR fixes several correctness and stability issues in the stress test and crash test infrastructure, plus adds regression coverage for trie UDI iterators.

db_stress: constrain batched prefix scans

BatchedOpsStressTest::TestPrefixScan opens 10 iterators with different prefixes and compares results in lockstep. Without prefix_same_as_start, unconstrained iterators could return keys beyond their seek prefix, producing more entries than bounded iterators and causing spurious assertion failures. Fix: set prefix_same_as_start = true on all iterators so every iterator stops at the seek prefix boundary.

db_stress: skip invalid cross-prefix iterator checks

When prefix iteration is enabled, ReadOptions requires that the seek key and iterate_lower_bound share the same prefix. Previously, stress test verification could run against configurations where they don't (e.g. when lower_bound spans a prefix boundary), producing false positives. Fix: detect this invalid configuration and mark the check as diverged (skip verification) rather than asserting.

db_crashtest: disable BlobDB in best-efforts recovery

BlobDB is not compatible with best-efforts recovery mode. Fix: explicitly disable blob file options when best_efforts_recovery=True in db_crashtest.py to avoid spurious failures.

db_crashtest: preserve expected-state dirs across restarts

The expected-state directory was being wiped on certain restart paths, causing verification failures on the next run. Fix: preserve expected-state dirs across db_crashtest restarts. Adds a new db_crashtest_test.py test to verify the behavior, runnable via make db_crashtest_test.

Add trie UDI iterator regression coverage

Adds regression tests for trie-based UserDefinedIndex (UDI) iterators covering snapshot-based reads, lower/upper bound iteration, auto_refresh_iterator_with_snapshot, and multi-version key handling. Also fixes an MSVC C4244 warning (int→char implicit narrowing) in the test.

Batched snapshot mode does not verify the full DB state against the
expected-state oracle. Instead, it relies on local invariants over the
10-key family written atomically by BatchedOpsStressTest::TestPut().

For a base key K and value body V, TestPut() writes:
  "0" + K -> V + "0"
  ...
  "9" + K -> V + "9"
in one write batch.

TestPrefixScan() then picks one base key, derives 10 seek prefixes
0+P .. 9+P where P is the first FLAGS_prefix_size - 1 bytes of K, and
opens 10 iterators on the same snapshot. The verifier advances those
iterators in lockstep and checks that:

* each iterator stays in its own seek prefix;
* no iterator exhausts its prefix before the others;
* at position n, all values match after stripping the trailing digit;
* the trailing digit still matches the corresponding prefix stream; and
* value() and columns() remain mutually consistent for wide-column /
  entity reads.

That lockstep oracle assumes every iterator is explicitly bounded to the
seek prefix before we assert same-length, same-index progress across all
10 streams. Before this change, the batched test only installed an
iterate_upper_bound to the next prefix half of the time. In the other
half, Seek(prefix) could legitimately continue into the next prefix,
while the verifier still assumed the iterator would stop at the prefix
boundary. That made the verifier depend on behavior it had not actually
requested from RocksDB.

Set ReadOptions::prefix_same_as_start = true for every iterator used by
this batched prefix verifier. This matches the semantics the verifier
needs: RocksDB records the seek prefix and invalidates the iterator once
Next() / Prev() leave that prefix, and the memtable path also switches
to dynamic prefix iteration under the same condition.

Relevant option interactions for this verifier:

* test_batches_snapshots requires prefix_size > 0.
* db_stress installs NewFixedPrefixTransform(prefix_size) whenever
  prefix_size >= 0, so the semantic precondition here is the configured
  prefix_extractor, not specifically memtablerep=prefix_hash.
* prefixpercent only controls how often this verifier runs.
* auto_refresh_iterator_with_snapshot is irrelevant here because the
  batched verifier pins one explicit snapshot for all 10 iterators.
* When a next-prefix upper bound is installed, the test still exercises
  iterate_upper_bound and SQFC range-query filtering.
* allow_unprepared_value is still handled via PrepareValue().
* use_put_entity_one_in, use_attribute_group, use_timed_put_one_in, and
  use_merge only change how the values/entities are encoded; the
  verifier still checks trailing-digit agreement and
  value()/columns() consistency.
* memtablerep=prefix_hash, memtable_prefix_bloom_size_ratio,
  memtable_whole_key_filtering, and
  enable_memtable_insert_with_hint_prefix_extractor affect prefix-aware
  filtering/performance paths but do not change the verifier's lockstep
  invariant.
* user_timestamp_size is out of scope because batched snapshot mode does
  not support timestamps.

This intentionally makes the batched prefix-scan oracle validate
explicitly prefix-bounded iteration every time, while the non-batched
stress path continues to provide separate randomized coverage of
iterate_upper_bound-only behavior.
@meta-cla meta-cla bot added the CLA Signed label Mar 26, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 26, 2026

⚠️ clang-tidy: 2 warning(s) on changed lines

Completed in 1461.9s.

Summary by check

Check Count
performance-unnecessary-copy-initialization 2
Total 2

Details

db/blob/db_blob_basic_test.cc (2 warning(s))
db/blob/db_blob_basic_test.cc:2612:17: warning: local copy 'r1_end' of the variable 'mid' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]
db/blob/db_blob_basic_test.cc:2613:17: warning: local copy 'r2_start' of the variable 'mid' is never modified; consider avoiding the copy [performance-unnecessary-copy-initialization]

@github-actions
Copy link
Copy Markdown

✅ Claude Code Review

Auto-triggered after CI passed — reviewing commit 5514e9d


I now have a thorough understanding of the change. Let me write the review.


Summary

Correct, minimal fix for a real assertion failure in BatchedOpsStressTest::TestPrefixScan. The lockstep comparison loop assumes all 10 iterators produce the same number of entries per prefix, but iterators without iterate_upper_bound could scan past their prefix boundary, violating this invariant. Setting prefix_same_as_start = true on all iterators constrains them properly.

Issues Found

No critical or significant issues.

🟢 Nitpick: The while-loop at line 606 already manually checks iters[0]->key().starts_with(prefix_slices[0]), which is redundant when prefix_same_as_start is now always true (the iterator itself will become invalid at the prefix boundary). This is harmless — the manual check is now a belt-and-suspenders defense — but worth noting that the outer while-loop condition could be simplified to just iters[0]->Valid(). However, keeping the explicit prefix check is arguably more defensive and readable, so no change is necessary.

🟢 Nitpick: The comment (lines 588-590 in the post-change file) is good and explains the "why." No issues there.

Cross-Component Analysis

Context Assessment
Prefix seek / total order seek prefix_same_as_start requires a prefix extractor. The function is gated by FLAGS_prefix_size > 0 (line 565) and BatchedOpsStressTest requires prefix mode, so the precondition is satisfied.
iterate_upper_bound interaction When both prefix_same_as_start and iterate_upper_bound are set, the iterator stops at whichever boundary comes first. Since iterate_upper_bound is set to "next prefix" via GetNextPrefix, and prefix_same_as_start also stops at the prefix boundary, these are semantically equivalent — setting both is safe and consistent.
Comparison with no_batched_ops_stress.cc The sibling implementation (line 1628-1629) uses prefix_same_as_start randomly and independently per single iterator. The batched version correctly makes it unconditional because it needs all 10 iterators to agree. This is the right design choice.
Performance This is stress test code, not a hot path. No performance concerns.

Positive Observations

  • The fix is surgical: one line of functional change, well-targeted at the root cause.
  • The comment clearly explains why the setting is needed (lockstep comparison), not just what it does.
  • The approach is consistent with how no_batched_ops_stress.cc already uses the same option.

ℹ️ About this response

Generated by Claude Code.
Review methodology: claude_md/code_review.md

Limitations:

  • Claude may miss context from files not in the diff
  • Large PRs may be truncated
  • Always apply human judgment to AI suggestions

Commands:

  • /claude-review [context] — Request a code review
  • /claude-query <question> — Ask about the PR or codebase

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 26, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D98302018.

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Mar 28, 2026

@xingbowang has imported this pull request. If you are a Meta employee, you can view this in D98302018.

@xingbowang xingbowang changed the title db_stress: constrain batched prefix scans db_stress: fix prefix scan correctness and crashtest stability Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant