-
Notifications
You must be signed in to change notification settings - Fork 4k
kvserver,storage: recreate long-lived iterators in replica consistenc… #156395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
How does this achieve consistency? Is the assumption that the replica data does not change between consecutive iterators? Which means the iterators should be consistent, or we should stall the What is the plan/mechanism to overcome this? Could you please cover this in the PR description, as a bit of a "design" summary? |
d9c7790 to
89f9293
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jbowens)
pkg/storage/mvcc.go line 7270 at r1 (raw file):
Regarding
How does this achieve consistency? Is the assumption that the replica data does not change between consecutive iterators?
See this comment and the preceding comment (with the declaration of ComputeStatsVisitors).
I've updated the PR description.
⚪ Sysbench [SQL, 3node, oltp_read_write]
Reproducebenchdiff binaries: mkdir -p benchdiff/89f9293/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/89f929347cf0b2be3a56e17bcbb0c4291e70c8be/bin/pkg_sql_tests benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=c6539d4 --new=89f9293 ./pkg/sql/tests⚪ Sysbench [KV, 3node, oltp_read_only]
Reproducebenchdiff binaries: mkdir -p benchdiff/89f9293/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/89f929347cf0b2be3a56e17bcbb0c4291e70c8be/bin/pkg_sql_tests benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=c6539d4 --new=89f9293 ./pkg/sql/tests🔴 Sysbench [KV, 3node, oltp_write_only]
Reproducebenchdiff binaries: mkdir -p benchdiff/89f9293/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/89f929347cf0b2be3a56e17bcbb0c4291e70c8be/bin/pkg_sql_tests benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/89f9293/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=c6539d4 --new=89f9293 ./pkg/sql/testsArtifactsdownload: mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/89f929347cf0b2be3a56e17bcbb0c4291e70c8be/18972450869-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/c6539d4f2519f6d24214b4d9f284b64743345b8c/18972450869-1/\* old/built with commit: 89f929347cf0b2be3a56e17bcbb0c4291e70c8be |
pav-kv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @jbowens)
pkg/storage/mvcc.go line 7270 at r1 (raw file):
Previously, sumeerbhola wrote…
Regarding
How does this achieve consistency? Is the assumption that the replica data does not change between consecutive iterators?
See this comment and the preceding comment (with the declaration of
ComputeStatsVisitors).I've updated the PR description.
SGTM, thanks. I copied the updated commit description to the PR desc.
89f9293 to
c42c7cb
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
🔴 Sysbench [SQL, 3node, oltp_read_write]
Reproducebenchdiff binaries: mkdir -p benchdiff/c42c7cb/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c42c7cb954bbff462d6b9f7ba174e36751d1c2d2/bin/pkg_sql_tests benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=c6539d4 --new=c42c7cb ./pkg/sql/tests🟢 Sysbench [KV, 3node, oltp_read_only]
Reproducebenchdiff binaries: mkdir -p benchdiff/c42c7cb/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c42c7cb954bbff462d6b9f7ba174e36751d1c2d2/bin/pkg_sql_tests benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=c6539d4 --new=c42c7cb ./pkg/sql/tests⚪ Sysbench [KV, 3node, oltp_write_only]
Reproducebenchdiff binaries: mkdir -p benchdiff/c42c7cb/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c42c7cb954bbff462d6b9f7ba174e36751d1c2d2/bin/pkg_sql_tests benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c42c7cb/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/c6539d4/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/c6539d4f2519f6d24214b4d9f284b64743345b8c/bin/pkg_sql_tests benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/c6539d4/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=c6539d4 --new=c42c7cb ./pkg/sql/testsArtifactsdownload: mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/c42c7cb954bbff462d6b9f7ba174e36751d1c2d2/19038365263-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/c6539d4f2519f6d24214b4d9f284b64743345b8c/19038365263-1/\* old/built with commit: c42c7cb954bbff462d6b9f7ba174e36751d1c2d2 |
|
The claude code "bugs" are due to the TODOs that will be removed before merging, and exist to ensure CI is thoroughly exercising iterator recreation. |
|
FYI theres the O-AI-Review-Not-Helpful label. I added it. |
jbowens
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbowens reviewed 2 of 3 files at r1, 1 of 1 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @sumeerbhola)
pkg/kv/kvserver/replica_consistency.go line 515 at r3 (raw file):
} now := crtime.NowMono() if now.Sub(lastResumeSoonTime) > storage.SnapshotRecreateIterDuration.Get(&settings.SV) ||
nit: could inline the crtime.NowMono() call
pkg/storage/mvcc.go line 7305 at r3 (raw file):
return err } if firstIter {
nit: is this firstIter conditional necessary, or could we unconditionally Add and check for consistent iterators if resumeKey is present? I think the code is fine even if it's not necessary, but was curious if we're dependent on it in a way I'm not seeing
pkg/storage/mvcc.go line 7370 at r3 (raw file):
// // isConsistentItersForRaceEnabled is used for RaceEnabled builds to stress // the resumption code paths.
i'm probably just being dense, but I don't understand this parameter. where does it get set to true? can the RaceEnabled part be handled at the usage and the parameter just be isConsistentIters?
c42c7cb to
43c05d6
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jbowens)
pkg/kv/kvserver/replica_consistency.go line 515 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
nit: could inline the
crtime.NowMono()call
Done
pkg/storage/mvcc.go line 7305 at r3 (raw file):
is this
firstIterconditional necessary, or could we unconditionallyAdd
I think it is fine. Since the common case is only a single iterator, I didn't want to do anything that shows up as a regression on some microbenchmark. And I have a dislike for exercising all that AgeTo stuff.
pkg/storage/mvcc.go line 7370 at r3 (raw file):
Previously, jbowens (Jackson Owens) wrote…
i'm probably just being dense, but I don't understand this parameter. where does it get set to true? can the
RaceEnabledpart be handled at the usage and the parameter just beisConsistentIters?
The isConsistentIters is insufficient since the caller may not even have a Reader so can't create another iter. See ComputeStatsForIter.
The name of this parameter was confusing. I've changed it to allowResumptionForTesting and added a comment.
// We desire to stress test this resumption behavior even when the callbacks
// don't return resumeSoon=true, when the caller knows that such testing is
// permissible from a correctness perspective (the caller has the capability
// to create another iterator and call this function again). The
// allowResumptionForTesting is set to true by the caller in that case.
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
43c05d6 to
8b7ad07
Compare
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this iterator reopening happen transparently inside Pebble or storage package? With some read setting instructing to do so. Or there is a good reason to pull it into the user space?
The Reader interface does a bunch of tricks with caching iterators etc. Wonder if it could do this one more trick.
Intuitively, MVCCIterator (or Pebble iterator) might internally know how to restart itself more efficiently (maybe it needs to only unpin flushed memtables or something?) than close+open done from the very top. Plus it would be nice not to force the user to worry about this (more than, say, setting a parameter in IterOptions, or calling some simpler method like Reopen).
Upd: I think this is partially already the case (most of the logic is in storage), just wondering if it there is no way to push this further down. Do we need the resumeSoon logic up in the consistency checker? Is it because it already has rate limiting, so we partly piggyback on it?
As you realized, the code here is almost wholly in the storage package. It is only guided by the bool return set to true in replica_consistency.go. That guidance from a higher layer is important, for the time policy and because it itself is being throttled, and because it know it is using a snapshot so there is benefit for recreating. Also, the fact that resumption here is in-between range keys is important because of how stats are computed which is not something we expect a very low layer to understand. Finally, we move things down into Pebble (or any lower layer) when there is a good reason to from a performance or significant code simplification perspective. One could think of this as a generalization of the end-to-end argument (https://web.mit.edu/saltzer/www/publications/endtoend/endtoend.pdf). It isn't obvious to me that moving code from replica_consistency.go to the storage package would benefit anything. If/when we have multiple callers above the storage package, and we see a similar pattern across them, we can see if there is some more generalization possible. |
…y check This capability is enabled by changing the ComputeStatsVisitors callbacks to specify a resumeSoon bool return value. This is used in computeStatsForIterWithVisitors to return before finishing all the work and the returned resumeKey is used in ComputeStatsWithVisitors to resume using a new iterator. Similarly, computeLockTableStatsWithVisitors uses the resumeSoon to construct new iterators when doing the lock table iteration. To ensure correctness, the resumeSoon=true feature must only be used when the storage.Reader returns true from Reader.ConsistentIterators, which makes the promise that the different Iterators returned by the Reader see the same underlying Engine state. The replica consistency check uses a Pebble snapshot, for which the iterators are consistent. Fixes cockroachdb#154533 Epic: none Release note: None
8b7ad07 to
6b92909
Compare
⚪ Sysbench [SQL, 3node, oltp_read_write]
Reproducebenchdiff binaries: mkdir -p benchdiff/6b92909/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/6b92909aad9ec4653c83d5ee8b3de89247db6b56/bin/pkg_sql_tests benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/076a595/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/076a595091d74eb1b1dff8b3ec3ccc6a31c4cc4d/bin/pkg_sql_tests benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/SQL/3node/oltp_read_write$ --old=076a595 --new=6b92909 ./pkg/sql/tests⚪ Sysbench [KV, 3node, oltp_read_only]
Reproducebenchdiff binaries: mkdir -p benchdiff/6b92909/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/6b92909aad9ec4653c83d5ee8b3de89247db6b56/bin/pkg_sql_tests benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/076a595/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/076a595091d74eb1b1dff8b3ec3ccc6a31c4cc4d/bin/pkg_sql_tests benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_read_only$ --old=076a595 --new=6b92909 ./pkg/sql/tests⚪ Sysbench [KV, 3node, oltp_write_only]
Reproducebenchdiff binaries: mkdir -p benchdiff/6b92909/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/6b92909aad9ec4653c83d5ee8b3de89247db6b56/bin/pkg_sql_tests benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/6b92909/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
mkdir -p benchdiff/076a595/bin/1058449141
gcloud storage cp gs://cockroach-microbench-ci/builds/076a595091d74eb1b1dff8b3ec3ccc6a31c4cc4d/bin/pkg_sql_tests benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_tests
chmod +x benchdiff/076a595/bin/1058449141/cockroachdb_cockroach_pkg_sql_testsbenchdiff command: benchdiff --run=^BenchmarkSysbench/KV/3node/oltp_write_only$ --old=076a595 --new=6b92909 ./pkg/sql/testsArtifactsdownload: mkdir -p new
gcloud storage cp gs://cockroach-microbench-ci/artifacts/6b92909aad9ec4653c83d5ee8b3de89247db6b56/19675355901-1/\* new/
mkdir -p old
gcloud storage cp gs://cockroach-microbench-ci/artifacts/076a595091d74eb1b1dff8b3ec3ccc6a31c4cc4d/19675355901-1/\* old/built with commit: 6b92909aad9ec4653c83d5ee8b3de89247db6b56 |
|
TFTRs! |
|
bors r+ |
156395: kvserver,storage: recreate long-lived iterators in replica consistenc… r=sumeerbhola a=sumeerbhola …y check This capability is enabled by changing the `ComputeStatsVisitors` callbacks to specify a `resumeSoon` bool return value. This is used in `computeStatsForIterWithVisitors` to return before finishing all the work and the returned resumeKey is used in `ComputeStatsWithVisitors` to resume using a new iterator. Similarly, `computeLockTableStatsWithVisitors` uses the resumeSoon to construct new iterators when doing the lock table iteration. To ensure correctness, the `resumeSoon=true` feature must only be used when the `storage.Reader` returns true from `Reader.ConsistentIterators`, which makes the promise that the different Iterators returned by the `Reader` see the same underlying `Engine` state. The replica consistency check uses a Pebble snapshot, for which the iterators are consistent. Fixes #154533 Epic: none Release note: None 158330: kvcoord: add traces to TestUnexpectedCommitOnTxnRecovery r=miraradeva a=stevendanna We've recently observed a failure in this test that was hard to debug because it appears that the test was running for a long time and then hit an assesrtion failure inside a goroutine. Here, ensure that we don't fail the test from inside a goroutine and also try to ensure that the test fails a bit more promptly by setting a context timeout. We've also added traces that should be printed in the case of a failure to help debug what happened. Informs #158194 Release note: None 158343: mma: remove accidental duplicate comment r=wenyihu6 a=sumeerbhola Epic: CRDB-55052 Release note: None Co-authored-by: sumeerbhola <sumeer@cockroachlabs.com> Co-authored-by: Steven Danna <danna@cockroachlabs.com>
|
Build failed (retrying...):
|
…y check
This capability is enabled by changing the
ComputeStatsVisitorscallbacks to specify aresumeSoonbool return value. This is used incomputeStatsForIterWithVisitorsto return before finishing all the work and the returned resumeKey is used inComputeStatsWithVisitorsto resume using a new iterator. Similarly,computeLockTableStatsWithVisitorsuses the resumeSoon to construct new iterators when doing the lock table iteration.To ensure correctness, the
resumeSoon=truefeature must only be used when thestorage.Readerreturns true fromReader.ConsistentIterators, which makes the promise that the different Iterators returned by theReadersee the same underlyingEnginestate. The replica consistency check uses a Pebble snapshot, for which the iterators are consistent.Fixes #154533
Epic: none
Release note: None