Skip to content

Git-Theta Clean #239

@blester125

Description

@blester125

Currently, when a git commit is removed from git, the lfs files for the parameters will remain in .git/lfs, we should have a command like git theta clean that will remove dangling parameter files. This is especially useful in the cases where a merge is undone or an experimental branch is deleted.

Basic steps would probably be:

  • Iterate through all files that are theta tracked (though all history)
  • Iterate through the history of each file
  • Collect the git lfs oid metadata for each parameter in the model
  • Delete all files from .git/lfs that aren't in the git history

We might need to also check for lfs tracked files to make sure we don't delete on that is needed. git lfs data seems to be stored in .git/lfs/XX/YY/ dirs where XXYY are the start of the oid metadata.

We would have to check all files above, even if the tool was scoped to delete a single model (i.e., git theta clean my-model.pt) because if parameters are shared between models they are shared in .git/lfs so we would need to make sure no other model uses that file.

As outlined, this would only clean up a local clone of the repo, unclear on how/if we would need to clean up the remote version

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions