Skip to content

Conversation

@sumeerbhola
Copy link
Collaborator

@sumeerbhola sumeerbhola commented Oct 28, 2025

…y check

This capability is enabled by changing the ComputeStatsVisitors callbacks to specify a resumeSoon bool return value. This is used in computeStatsForIterWithVisitors to return before finishing all the work and the returned resumeKey is used in ComputeStatsWithVisitors to resume using a new iterator. Similarly, computeLockTableStatsWithVisitors uses the resumeSoon to construct new iterators when doing the lock table iteration.

To ensure correctness, the resumeSoon=true feature must only be used when the storage.Reader returns true from Reader.ConsistentIterators, which makes the promise that the different Iterators returned by the Reader see the same underlying Engine state. The replica consistency check uses a Pebble snapshot, for which the iterators are consistent.

Fixes #154533

Epic: none

Release note: None

@sumeerbhola sumeerbhola requested review from a team as code owners October 28, 2025 18:34
@sumeerbhola sumeerbhola requested a review from jbowens October 28, 2025 18:34
@blathers-crl
Copy link

blathers-crl bot commented Oct 28, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@pav-kv
Copy link
Collaborator

pav-kv commented Oct 28, 2025

Instead, a long-lived iterator should be closed and recreated.

How does this achieve consistency? Is the assumption that the replica data does not change between consecutive iterators? Which means the iterators should be consistent, or we should stall the Replica (hold its raftMu) while the computation takes place. The latter isn't true today: the checksum computation is async and takes advantage of the fact that the iterator is over an immutable snapshot.

What is the plan/mechanism to overcome this? Could you please cover this in the PR description, as a bit of a "design" summary?

@cockroach-teamcity cockroach-teamcity added the X-perf-gain Microbenchmarks CI: Added if a performance gain is detected label Oct 28, 2025
Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)


pkg/storage/mvcc.go line 7270 at r1 (raw file):
Regarding

How does this achieve consistency? Is the assumption that the replica data does not change between consecutive iterators?

See this comment and the preceding comment (with the declaration of ComputeStatsVisitors).

I've updated the PR description.

@cockroach-teamcity
Copy link
Member

⚪ Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 10.94m ±2% 10.96m ±1% ~ p=0.202 n=15
allocs/op 8.075k ±1% 8.097k ±2% ~ p=0.406 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/89f9293/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/89f929347cf0b2be3a56e17bcbb0c4291e70c8be/bin/pkg_sql_tests benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=c6539d4 --new=89f9293 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
sec/op 3.520m ±1% 3.502m ±1% ~ p=0.041 n=15
allocs/op 2.061k ±0% 2.059k ±0% ~ p=0.078 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/89f9293/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/89f929347cf0b2be3a56e17bcbb0c4291e70c8be/bin/pkg_sql_tests benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=c6539d4 --new=89f9293 ./pkg/sql/tests
🔴 Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
sec/op 3.685m ±3% 3.730m ±3% ~ p=0.806 n=15
🔴 allocs/op 4.196k ±0% 4.210k ±0% +0.33% p=0.000 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/89f9293/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/89f929347cf0b2be3a56e17bcbb0c4291e70c8be/bin/pkg_sql_tests benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=c6539d4 --new=89f9293 ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/89f929347cf0b2be3a56e17bcbb0c4291e70c8be/18972450869-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/c6539d4f2519f6d24214b4d9f284b64743345b8c/18972450869-1/\* old/

built with commit: 89f929347cf0b2be3a56e17bcbb0c4291e70c8be

@cockroach-teamcity cockroach-teamcity added the X-perf-check Microbenchmarks CI: Added to a PR if a performance regression is detected and should be checked label Oct 31, 2025
Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @jbowens)


pkg/storage/mvcc.go line 7270 at r1 (raw file):

Previously, sumeerbhola wrote…

Regarding

How does this achieve consistency? Is the assumption that the replica data does not change between consecutive iterators?

See this comment and the preceding comment (with the declaration of ComputeStatsVisitors).

I've updated the PR description.

SGTM, thanks. I copied the updated commit description to the PR desc.

@github-actions
Copy link

github-actions bot commented Nov 3, 2025

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions github-actions bot added the o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. label Nov 3, 2025
@cockroach-teamcity
Copy link
Member

🔴 Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 11.51m ±3% 11.53m ±3% ~ p=0.713 n=15
🔴 allocs/op 8.260k ±0% 8.288k ±0% +0.34% p=0.000 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/c42c7cb/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c42c7cb954bbff462d6b9f7ba174e36751d1c2d2/bin/pkg_sql_tests benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=c6539d4 --new=c42c7cb ./pkg/sql/tests
🟢 Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
sec/op 3.318m ±1% 3.311m ±1% ~ p=0.902 n=15
🟢 allocs/op 2.080k ±0% 2.059k ±0% -1.01% p=0.000 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/c42c7cb/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c42c7cb954bbff462d6b9f7ba174e36751d1c2d2/bin/pkg_sql_tests benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=c6539d4 --new=c42c7cb ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
sec/op 3.621m ±1% 3.625m ±1% ~ p=0.713 n=15
allocs/op 4.201k ±0% 4.213k ±0% +0.29% p=0.000 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/c42c7cb/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c42c7cb954bbff462d6b9f7ba174e36751d1c2d2/bin/pkg_sql_tests benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=c6539d4 --new=c42c7cb ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/c42c7cb954bbff462d6b9f7ba174e36751d1c2d2/19038365263-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/c6539d4f2519f6d24214b4d9f284b64743345b8c/19038365263-1/\* old/

built with commit: c42c7cb954bbff462d6b9f7ba174e36751d1c2d2

@sumeerbhola
Copy link
Collaborator Author

The claude code "bugs" are due to the TODOs that will be removed before merging, and exist to ensure CI is thoroughly exercising iterator recreation.

@tbg tbg added the O-AI-Review-Not-Helpful AI reviewer produced result which was incorrect or unhelpful label Nov 4, 2025
@tbg
Copy link
Member

tbg commented Nov 4, 2025

FYI theres the O-AI-Review-Not-Helpful label. I added it.

Copy link
Collaborator

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

@jbowens reviewed 2 of 3 files at r1, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)


pkg/kv/kvserver/replica_consistency.go line 515 at r3 (raw file):

		}
		now := crtime.NowMono()
		if now.Sub(lastResumeSoonTime) > storage.SnapshotRecreateIterDuration.Get(&settings.SV) ||

nit: could inline the crtime.NowMono() call


pkg/storage/mvcc.go line 7305 at r3 (raw file):

			return err
		}
		if firstIter {

nit: is this firstIter conditional necessary, or could we unconditionally Add and check for consistent iterators if resumeKey is present? I think the code is fine even if it's not necessary, but was curious if we're dependent on it in a way I'm not seeing


pkg/storage/mvcc.go line 7370 at r3 (raw file):

//
// isConsistentItersForRaceEnabled is used for RaceEnabled builds to stress
// the resumption code paths.

i'm probably just being dense, but I don't understand this parameter. where does it get set to true? can the RaceEnabled part be handled at the usage and the parameter just be isConsistentIters?

Copy link
Collaborator Author

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens)


pkg/kv/kvserver/replica_consistency.go line 515 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

nit: could inline the crtime.NowMono() call

Done


pkg/storage/mvcc.go line 7305 at r3 (raw file):

is this firstIter conditional necessary, or could we unconditionally Add

I think it is fine. Since the common case is only a single iterator, I didn't want to do anything that shows up as a regression on some microbenchmark. And I have a dislike for exercising all that AgeTo stuff.


pkg/storage/mvcc.go line 7370 at r3 (raw file):

Previously, jbowens (Jackson Owens) wrote…

i'm probably just being dense, but I don't understand this parameter. where does it get set to true? can the RaceEnabled part be handled at the usage and the parameter just be isConsistentIters?

The isConsistentIters is insufficient since the caller may not even have a Reader so can't create another iter. See ComputeStatsForIter.
The name of this parameter was confusing. I've changed it to allowResumptionForTesting and added a comment.

// We desire to stress test this resumption behavior even when the callbacks
// don't return resumeSoon=true, when the caller knows that such testing is
// permissible from a correctness perspective (the caller has the capability
// to create another iterator and call this function again). The
// allowResumptionForTesting  is set to true by the caller in that case.

@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

@github-actions
Copy link

Potential Bug(s) Detected

The three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation.

Next Steps:
Please review the detailed findings in the workflow run.

Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary.

After you review the findings, please tag the issue as follows:

  • If the detected issue is real or was helpful in any way, please tag the issue with O-AI-Review-Real-Issue-Found
  • If the detected issue was not helpful in any way, please tag the issue with O-AI-Review-Not-Helpful

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this iterator reopening happen transparently inside Pebble or storage package? With some read setting instructing to do so. Or there is a good reason to pull it into the user space?

The Reader interface does a bunch of tricks with caching iterators etc. Wonder if it could do this one more trick.

Intuitively, MVCCIterator (or Pebble iterator) might internally know how to restart itself more efficiently (maybe it needs to only unpin flushed memtables or something?) than close+open done from the very top. Plus it would be nice not to force the user to worry about this (more than, say, setting a parameter in IterOptions, or calling some simpler method like Reopen).

Upd: I think this is partially already the case (most of the logic is in storage), just wondering if it there is no way to push this further down. Do we need the resumeSoon logic up in the consistency checker? Is it because it already has rate limiting, so we partly piggyback on it?

@sumeerbhola
Copy link
Collaborator Author

Could this iterator reopening happen transparently inside Pebble or storage package?
...
I think this is partially already the case (most of the logic is in storage), just wondering if it there is no way to push this further down. Do we need the resumeSoon logic up in the consistency checker? Is it because it already has rate limiting, so we partly piggyback on it?

As you realized, the code here is almost wholly in the storage package. It is only guided by the bool return set to true in replica_consistency.go. That guidance from a higher layer is important, for the time policy and because it itself is being throttled, and because it know it is using a snapshot so there is benefit for recreating.

Also, the fact that resumption here is in-between range keys is important because of how stats are computed which is not something we expect a very low layer to understand.

Finally, we move things down into Pebble (or any lower layer) when there is a good reason to from a performance or significant code simplification perspective. One could think of this as a generalization of the end-to-end argument (https://web.mit.edu/saltzer/www/publications/endtoend/endtoend.pdf). It isn't obvious to me that moving code from replica_consistency.go to the storage package would benefit anything. If/when we have multiple callers above the storage package, and we see a similar pattern across them, we can see if there is some more generalization possible.

…y check

This capability is enabled by changing the ComputeStatsVisitors callbacks
to specify a resumeSoon bool return value. This is used in
computeStatsForIterWithVisitors to return before finishing all the work
and the returned resumeKey is used in ComputeStatsWithVisitors to resume
using a new iterator. Similarly, computeLockTableStatsWithVisitors uses
the resumeSoon to construct new iterators when doing the lock table
iteration.

To ensure correctness, the resumeSoon=true feature must only be used
when the storage.Reader returns true from Reader.ConsistentIterators,
which makes the promise that the different Iterators returned by the
Reader see the same underlying Engine state. The replica consistency
check uses a Pebble snapshot, for which the iterators are consistent.

Fixes cockroachdb#154533

Epic: none

Release note: None
@cockroach-teamcity
Copy link
Member

⚪ Sysbench [SQL, 3node, oltp_read_write]
Metric Old Commit New Commit Delta Note
sec/op 11.27m ±2% 11.48m ±3% ~ p=0.345 n=15
allocs/op 8.196k ±2% 8.206k ±0% ~ p=0.616 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/6b92909/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/6b92909aad9ec4653c83d5ee8b3de89247db6b56/bin/pkg_sql_tests benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/076a595/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/076a595091d74eb1b1dff8b3ec3ccc6a31c4cc4d/bin/pkg_sql_tests benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=076a595 --new=6b92909 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_read_only]
Metric Old Commit New Commit Delta Note
sec/op 3.314m ±1% 3.297m ±1% ~ p=0.174 n=15
allocs/op 2.081k ±0% 2.081k ±0% ~ p=0.512 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/6b92909/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/6b92909aad9ec4653c83d5ee8b3de89247db6b56/bin/pkg_sql_tests benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/076a595/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/076a595091d74eb1b1dff8b3ec3ccc6a31c4cc4d/bin/pkg_sql_tests benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=076a595 --new=6b92909 ./pkg/sql/tests
⚪ Sysbench [KV, 3node, oltp_write_only]
Metric Old Commit New Commit Delta Note
sec/op 3.684m ±4% 3.648m ±6% ~ p=0.902 n=15
allocs/op 4.180k ±1% 4.186k ±1% ~ p=0.814 n=15
Reproduce

benchdiff binaries:

mkdir -p benchdiff/6b92909/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/6b92909aad9ec4653c83d5ee8b3de89247db6b56/bin/pkg_sql_tests benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/076a595/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/076a595091d74eb1b1dff8b3ec3ccc6a31c4cc4d/bin/pkg_sql_tests benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests

benchdiff command:

benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=076a595 --new=6b92909 ./pkg/sql/tests
Artifacts

download:

mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/6b92909aad9ec4653c83d5ee8b3de89247db6b56/19675355901-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/076a595091d74eb1b1dff8b3ec3ccc6a31c4cc4d/19675355901-1/\* old/

built with commit: 6b92909aad9ec4653c83d5ee8b3de89247db6b56

@sumeerbhola
Copy link
Collaborator Author

TFTRs!
I am going to merge this and we can discuss further improvements afterwards, since this has open for almost 1 month.

@sumeerbhola
Copy link
Collaborator Author

bors r+

craig bot pushed a commit that referenced this pull request Nov 25, 2025
156395: kvserver,storage: recreate long-lived iterators in replica consistenc… r=sumeerbhola a=sumeerbhola

…y check

This capability is enabled by changing the `ComputeStatsVisitors` callbacks to specify a `resumeSoon` bool return value. This is used in `computeStatsForIterWithVisitors` to return before finishing all the work and the returned resumeKey is used in `ComputeStatsWithVisitors` to resume using a new iterator. Similarly, `computeLockTableStatsWithVisitors` uses the resumeSoon to construct new iterators when doing the lock table iteration.

To ensure correctness, the `resumeSoon=true` feature must only be used when the `storage.Reader` returns true from `Reader.ConsistentIterators`, which makes the promise that the different Iterators returned by the `Reader` see the same underlying `Engine` state. The replica consistency check uses a Pebble snapshot, for which the iterators are consistent.

Fixes #154533

Epic: none

Release note: None

158330: kvcoord: add traces to TestUnexpectedCommitOnTxnRecovery r=miraradeva a=stevendanna

We've recently observed a failure in this test that was hard to debug because it appears that the test was running for a long time and then hit an assesrtion failure inside a goroutine.

Here, ensure that we don't fail the test from inside a goroutine and also try to ensure that the test fails a bit more promptly by setting a context timeout.

We've also added traces that should be printed in the case of a failure to help debug what happened.

Informs #158194

Release note: None

158343: mma: remove accidental duplicate comment r=wenyihu6 a=sumeerbhola

Epic: CRDB-55052

Release note: None

Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Nov 25, 2025

Build failed (retrying...):

  • unit_tests

@craig
Copy link
Contributor

craig bot commented Nov 25, 2025

@craig craig bot merged commit 1d3dc6f into cockroachdb:master Nov 25, 2025
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

O-AI-Review-Not-Helpful AI reviewer produced result which was incorrect or unhelpful o-AI-Review-Potential-Issue-Detected AI reviewer found potential issue. Never assign manually—auto-applied by GH action only. target-release-26.1.0 X-perf-check Microbenchmarks CI: Added to a PR if a performance regression is detected and should be checked X-perf-gain Microbenchmarks CI: Added if a performance gain is detected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kvserver: replica consistency checker should recreate long-lived iterators

5 participants