Skip to content

ankitjosh78/VisionText

Repository files navigation

VisionText

Multi-modal image search using SigLIP embeddings and Qdrant. Upload images with optional tags, then search your collection by text or by a query image.

Stack

  • FastAPI : API and web UI
  • SigLIP ViT-B-16 via OpenCLIP : image and text embeddings
  • Qdrant : vector database
  • Docker : deployment

Project structure

VisionText/
├── database/
│   ├── databaseInterface.py
│   ├── qdrantDB.py
│   └── models.py
├── services/
│   ├── embedding_service.py
│   ├── ingest_service.py
│   └── search_service.py
├── static/
│   ├── index.html
│   ├── styles.css
│   └── app.js
├── main.py
├── Dockerfile
├── docker-compose.yml
├── docker-compose.dev.yml
└── docker-compose.prod.yml

Running

cp .env.template .env
docker compose up -d
  • UI: http://localhost:8000
  • API docs: http://localhost:8000/docs
  • Qdrant dashboard: http://localhost:6333/dashboard

Before starting the app, create a .env file from .env.template and set your Hugging Face token.

Hugging Face token

  1. Sign in to https://huggingface.co
  2. Open SettingsAccess Tokens
  3. Create a new token with read access
  4. Copy .env.template to .env
  5. Replace HF_TOKEN=your_huggingface_token with your real token

Example:

cp .env.template .env

Then edit .env:

HF_TOKEN=hf_your_real_token_here

Development mode with hot-reload and source mounted:

docker compose -f docker-compose.dev.yml up --build

Production mode with 4 workers and resource limits:

docker compose -f docker-compose.prod.yml up -d --build

API

Method Endpoint Description
POST /search/text Search by text query
POST /search/image Search by image
POST /search/batch Batch text search
POST /ingest/image Upload a single image
POST /ingest/batch Upload multiple images (background)
GET /stats Collection stats
DELETE /images/{id} Delete an image

Examples

# Text search
curl -X POST http://localhost:8000/search/text \
  -H "Content-Type: application/json" \
  -d '{"query": "cat on a couch", "top_k": 5}'

# Upload image with tags
curl -X POST http://localhost:8000/ingest/image \
  -F "file=@photo.jpg" \
  -F 'metadata={"tags": ["cat", "indoor"]}'

# Image search with optional tag hints
curl -X POST "http://localhost:8000/search/image?top_k=5" \
  -F "file=@query.jpg" \
  -F "tags=cat,outdoor"

# Batch upload
curl -X POST http://localhost:8000/ingest/batch \
  -F "files=@image1.jpg" \
  -F "files=@image2.jpg" \
  -F "default_tags=vacation,2024"

Search ranking

Text and batch searches use Reciprocal Rank Fusion (RRF) to blend vector similarity with tag matching. Neither signal fully overrides the other. A strong vector match without tags still ranks above a weak vector match with a tag hit. Image searches support optional tag hints via the tags form field for the same RRF blend.

Environment variables

Variable Default Description
QDRANT_HOST localhost Qdrant hostname
QDRANT_PORT 6333 Qdrant port
COLLECTION_NAME images Collection name
EMBEDDING_DIM 768 Embedding dimension
EMBEDDING_MODEL hf-hub:timm/ViT-B-16-SigLIP OpenCLIP model identifier
HF_TOKEN none Hugging Face access token used for model downloads

Local development (without Docker)

pip install -r requirements.txt
docker run -p 6333:6333 qdrant/qdrant
python main.py

About

Multi-modal image search using SigLIP embeddings and Qdrant. Upload images with optional tags, then search your collection by text or by a query image.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors