Multi-modal image search using SigLIP embeddings and Qdrant. Upload images with optional tags, then search your collection by text or by a query image.
- FastAPI : API and web UI
- SigLIP ViT-B-16 via OpenCLIP : image and text embeddings
- Qdrant : vector database
- Docker : deployment
VisionText/
├── database/
│ ├── databaseInterface.py
│ ├── qdrantDB.py
│ └── models.py
├── services/
│ ├── embedding_service.py
│ ├── ingest_service.py
│ └── search_service.py
├── static/
│ ├── index.html
│ ├── styles.css
│ └── app.js
├── main.py
├── Dockerfile
├── docker-compose.yml
├── docker-compose.dev.yml
└── docker-compose.prod.yml
cp .env.template .env
docker compose up -d- UI:
http://localhost:8000 - API docs:
http://localhost:8000/docs - Qdrant dashboard:
http://localhost:6333/dashboard
Before starting the app, create a .env file from .env.template and set your Hugging Face token.
- Sign in to
https://huggingface.co - Open
Settings→Access Tokens - Create a new token with read access
- Copy
.env.templateto.env - Replace
HF_TOKEN=your_huggingface_tokenwith your real token
Example:
cp .env.template .envThen edit .env:
HF_TOKEN=hf_your_real_token_hereDevelopment mode with hot-reload and source mounted:
docker compose -f docker-compose.dev.yml up --buildProduction mode with 4 workers and resource limits:
docker compose -f docker-compose.prod.yml up -d --build| Method | Endpoint | Description |
|---|---|---|
| POST | /search/text |
Search by text query |
| POST | /search/image |
Search by image |
| POST | /search/batch |
Batch text search |
| POST | /ingest/image |
Upload a single image |
| POST | /ingest/batch |
Upload multiple images (background) |
| GET | /stats |
Collection stats |
| DELETE | /images/{id} |
Delete an image |
# Text search
curl -X POST http://localhost:8000/search/text \
-H "Content-Type: application/json" \
-d '{"query": "cat on a couch", "top_k": 5}'
# Upload image with tags
curl -X POST http://localhost:8000/ingest/image \
-F "file=@photo.jpg" \
-F 'metadata={"tags": ["cat", "indoor"]}'
# Image search with optional tag hints
curl -X POST "http://localhost:8000/search/image?top_k=5" \
-F "file=@query.jpg" \
-F "tags=cat,outdoor"
# Batch upload
curl -X POST http://localhost:8000/ingest/batch \
-F "files=@image1.jpg" \
-F "files=@image2.jpg" \
-F "default_tags=vacation,2024"Text and batch searches use Reciprocal Rank Fusion (RRF) to blend vector similarity with tag matching. Neither signal fully overrides the other. A strong vector match without tags still ranks above a weak vector match with a tag hit. Image searches support optional tag hints via the tags form field for the same RRF blend.
| Variable | Default | Description |
|---|---|---|
QDRANT_HOST |
localhost |
Qdrant hostname |
QDRANT_PORT |
6333 |
Qdrant port |
COLLECTION_NAME |
images |
Collection name |
EMBEDDING_DIM |
768 |
Embedding dimension |
EMBEDDING_MODEL |
hf-hub:timm/ViT-B-16-SigLIP |
OpenCLIP model identifier |
HF_TOKEN |
none | Hugging Face access token used for model downloads |
pip install -r requirements.txt
docker run -p 6333:6333 qdrant/qdrant
python main.py