-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Currently, when a git commit is removed from git, the lfs files for the parameters will remain in .git/lfs, we should have a command like git theta clean that will remove dangling parameter files. This is especially useful in the cases where a merge is undone or an experimental branch is deleted.
Basic steps would probably be:
- Iterate through all files that are theta tracked (though all history)
- Iterate through the history of each file
- Collect the git lfs oid metadata for each parameter in the model
- Delete all files from
.git/lfsthat aren't in the git history
We might need to also check for lfs tracked files to make sure we don't delete on that is needed. git lfs data seems to be stored in .git/lfs/XX/YY/ dirs where XXYY are the start of the oid metadata.
We would have to check all files above, even if the tool was scoped to delete a single model (i.e., git theta clean my-model.pt) because if parameters are shared between models they are shared in .git/lfs so we would need to make sure no other model uses that file.
As outlined, this would only clean up a local clone of the repo, unclear on how/if we would need to clean up the remote version