-
Notifications
You must be signed in to change notification settings - Fork 4
Additions to Media de-duplication efforts #210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additions to Media de-duplication efforts #210
Conversation
888b7eb to
bd9d98e
Compare
|
We have now been held up by the Github Actions runners moving on without us. Tests that check for type checking and linting will not install the project because the version of PyO3 requested won't run with the Python 3.14 that now comes on the runner by default. We have a work around for this, I'll apply it soon |
|
There we go. Cherry-picked from |
itsoyou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for taking care of the dedup topic. I like the parameterized tests, good way to simplify testing backward compatibility
itsoyou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
deletes adjust some testing moved this test case to a place that supports a thread pool, and broke it into several testable components remove unused stage_into_file() function Apply feedback about resolve_path_to_media_file() Lose this TODO and do not recreate the IOBytes object Put back this new line, avoid merge conflicts Abstract actual duplicate media deletion into it's own subroutine Clean up counting how many references there are to a given media object based on sha256 Update test to clarify some comments about behavior Fix copypasta comment without update lint
I guess because of a rust update?
…ing github actions Somehow, PyO3 version 0.24.1 is being used which only has support for Python <= 3.13 even though the version pinned in Cargo.toml is 0.25.1. It is unknown where exactly this version of PyO3 is coming from, but caching oddities are suspected Revert this after Synapse is overall bumped to Python 3.14 in v1.142.0
e249358 to
8fb8e19
Compare
A little bit of clean up. Abstracting how the local path to the file is resolved. Ensure consistency when removing media, that if there are still references to the media that only the metadata is removed until it is the last reference.
Resolving concerns about moving existing media by leaving it in place and only new media is de-duplicated.
Stage incoming remote media into it's legacy file path until the sha256 can be discovered, and only after it is written to the database metadata is it moved into it's new place. This specific set of steps works with the file path resolving, so if the moving process is interrupted it will still be usable.
Always provide
StorageProviderclasses with a consistent relative "path". For a media id of1234567890, it would break down to12/34/567890. Do not use the sha256 for this path.StorageProviders are provided theFileInfoclass as well, if they want to do media de-dupe they can get it from there. Should we provide an enhancement to the bundledStorageProviderto allow for this capability?Depends on #155