Skip to content

Conversation

@jason-famedly
Copy link
Contributor

@jason-famedly jason-famedly commented Nov 18, 2025

A little bit of clean up. Abstracting how the local path to the file is resolved. Ensure consistency when removing media, that if there are still references to the media that only the metadata is removed until it is the last reference.

Resolving concerns about moving existing media by leaving it in place and only new media is de-duplicated.

Stage incoming remote media into it's legacy file path until the sha256 can be discovered, and only after it is written to the database metadata is it moved into it's new place. This specific set of steps works with the file path resolving, so if the moving process is interrupted it will still be usable.

Always provide StorageProvider classes with a consistent relative "path". For a media id of 1234567890, it would break down to 12/34/567890. Do not use the sha256 for this path. StorageProviders are provided the FileInfo class as well, if they want to do media de-dupe they can get it from there. Should we provide an enhancement to the bundled StorageProvider to allow for this capability?

Depends on #155

@jason-famedly jason-famedly force-pushed the jason/amend-file-shas-work branch from 888b7eb to bd9d98e Compare November 21, 2025 20:31
@jason-famedly
Copy link
Contributor Author

We have now been held up by the Github Actions runners moving on without us. Tests that check for type checking and linting will not install the project because the version of PyO3 requested won't run with the Python 3.14 that now comes on the runner by default. We have a work around for this, I'll apply it soon

@jason-famedly
Copy link
Contributor Author

There we go. Cherry-picked from master. Now the tests should run

@jason-famedly jason-famedly marked this pull request as ready for review November 24, 2025 12:20
@jason-famedly jason-famedly requested a review from a team as a code owner November 24, 2025 12:20
Copy link
Contributor

@itsoyou itsoyou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for taking care of the dedup topic. I like the parameterized tests, good way to simplify testing backward compatibility

Copy link
Contributor

@itsoyou itsoyou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

deletes

adjust some testing

moved this test case to a place that supports a thread pool, and broke
it into several testable components

remove unused stage_into_file() function

Apply feedback about resolve_path_to_media_file()

Lose this TODO and do not recreate the IOBytes object

Put back this new line, avoid merge conflicts

Abstract actual duplicate media deletion into it's own subroutine

Clean up counting how many references there are to a given media object
based on sha256

Update test to clarify some comments about behavior

Fix copypasta comment without update

lint
I guess because of a rust update?
…ing github actions

Somehow, PyO3 version 0.24.1 is being used which only has support for
Python <= 3.13 even though the version pinned in Cargo.toml is 0.25.1.

It is unknown where exactly this version of PyO3 is coming from, but
caching oddities are suspected

Revert this after Synapse is overall bumped to Python 3.14 in v1.142.0
@jason-famedly jason-famedly force-pushed the jason/amend-file-shas-work branch from e249358 to 8fb8e19 Compare December 8, 2025 16:50
@jason-famedly jason-famedly merged commit cff725d into syk/file-path-with-sha Dec 8, 2025
10 checks passed
@jason-famedly jason-famedly deleted the jason/amend-file-shas-work branch December 8, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants