Skip to content

push nomic and add sample output#8

Open
Feliren88 wants to merge 1 commit intomainfrom
f/dedup-hash-nomic
Open

push nomic and add sample output#8
Feliren88 wants to merge 1 commit intomainfrom
f/dedup-hash-nomic

Conversation

@Feliren88
Copy link
Collaborator

Image Duplicate Finder using Nomic Vision Embeddings

This PR adds a standalone script for detecting duplicate or highly similar images within a directory. The implementation:

  • Uses the powerful Nomic Vision embedding model (nomic-embed-vision-v1) to generate high-quality image representations
  • Implements efficient pairwise similarity comparison with PyTorch
  • Provides GPU acceleration for faster processing
  • Includes a configurable similarity threshold to control matching strictness
  • Outputs results in a clearly formatted CSV file

Key improvements:

  • Single folder processing - finds duplicates within any given directory
  • Memory-efficient design - stores embeddings on CPU while processing
  • Progress tracking with tqdm
  • Comprehensive documentation in README.md
  • Error handling for corrupted or problematic images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant