Skip to content

Use WAL with SQLite cache, fix close#21154

Open
hauntsaninja wants to merge 4 commits intopython:masterfrom
hauntsaninja:sqlitewal
Open

Use WAL with SQLite cache, fix close#21154
hauntsaninja wants to merge 4 commits intopython:masterfrom
hauntsaninja:sqlitewal

Conversation

@hauntsaninja
Copy link
Copy Markdown
Collaborator

@hauntsaninja hauntsaninja commented Apr 3, 2026

This is the more modern way to manage concurrency with SQLite. Relevant to current discussion, it means concurrent mypy runs using the cache will wait for each other, rather than fail

SQLite also claims this is significantly faster, but I haven't yet done a good profile (If you are profiling this, note that WAL is a persistent setting, so you will want to delete the cache). This might also allow removing the PRAGMA synchronous=OFF

Finally, I also explicitly close the connection in main. This is relevant to this change, because it forces checkpointing of the WAL, which keeps reads fast, reduces disk space and means the cache.db remains a single self-contained file under regular use

Fixes #21136, see also discussion in #13916

For what it's worth, I feel there are many legitimate uses of concurrent mypy. At work, we often share cache between multiple projects. At home, I often end up having parallel runs with a debugger while working on mypy (although this PR just makes those ones hang waiting for the lock lol)

This is the more modern way to manage concurrency with SQLite

In our case, it means concurrent mypy runs using the cache will wait for
each other, rather than fail

SQLite also claims this is faster, but I haven't yet done a good profile
(If you are profiling this, note that WAL is a persistent setting, so
you will want to delete the cache)

Finally, I also explicitly close the connection in main. This is
relevant to this change, because it forces checkpointing of the WAL,
which reduces disk space and means the cache.db remains a single
self-contained file in regular use
@hauntsaninja hauntsaninja marked this pull request as draft April 3, 2026 01:30
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 3, 2026

According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅

@ilevkivskyi
Copy link
Copy Markdown
Member

For what it's worth, I feel there are many legitimate uses of concurrent mypy

Can you give some more concrete examples? Also the problem is, as I mentioned in other issue, even if there are such uses, current incremental logic was not designed for this, and it would be tricky to guarantee correctness.

@ilevkivskyi
Copy link
Copy Markdown
Member

Anyway, I am not strongly against this per se, as I mentioned in #13916 (comment) IMO the key point is to give a loud warning, the exact best effort semantics is not as important then.

Also, to be clear, although this will make SQLite cache behave like FS cache, this will not solve other concurrent-related crashes like #14521 or #18473, while disabling cache completely will fix those.

@hauntsaninja
Copy link
Copy Markdown
Collaborator Author

Sure, can talk a little bit more about work use case. We have monorepo with lots of first party projects. These often have similar dependency graph, and sharing cache across them helps, e.g. import torch is slow if you analyse cold but now it will always be warm, and you avoid ending up with gigabytes of duplicated mypy cache everywhere.

Note this behaviour will still be a little different from FS cache (and so might reduce likelihood of those issues). Once a connection has started writing, this will block until the connection commits at the end of the build. If we do want the fallback behaviour to match FS cache, we should set isolation_level=None so each write is its own transaction

@hauntsaninja hauntsaninja marked this pull request as ready for review April 3, 2026 21:06
@ilevkivskyi
Copy link
Copy Markdown
Member

These often have similar dependency graph

OK, in such cases parallel mypy invocations will likely work, unless you use different options. But even in such cases I would give a warning, to make it clear that we can't guarantee correctness in general, and users can do this at their own risk.

Once a connection has started writing, this will block until the connection commits at the end of the build

And how exactly will this work in case of parallel type checking? In this case we want the workers to be reading/writing ~freely (because coordinator is already making sure they are scheduled in a way to guarantee correctness).

If we do want the fallback behaviour to match FS cache, we should set isolation_level=None so each write is its own transaction

This seems like a better option, but I vaguely remember I tried it at some point, and I didn't like it in terms of performance, but it well may be I did something wrong. In general, it would be good to have performance measurements for parallel checking (say on self check with cold cache) for both these options vs status quo. This is arguably a niche use case, and I don't want to sacrifice performance for everyone else because of it.

@ilevkivskyi
Copy link
Copy Markdown
Member

ilevkivskyi commented Apr 4, 2026

Btw couple notes on performance measurements for parallel checking:

  • Always use a compiled version, and run self-check from outside of mypy directory, otherwise Python will find local (interpreted) version of mypy.build_worker.worker.
  • Run ~twice more runs that you would normally do, since time variance for parallel checking is much higher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mypy 1.20 crashes on concurrent runs

2 participants