VisionText

Multi-modal image search using SigLIP embeddings and Qdrant. Upload images with optional tags, then search your collection by text or by a query image.

Stack

FastAPI : API and web UI
SigLIP ViT-B-16 via OpenCLIP : image and text embeddings
Qdrant : vector database
Docker : deployment

Project structure

VisionText/
├── database/
│   ├── databaseInterface.py
│   ├── qdrantDB.py
│   └── models.py
├── services/
│   ├── embedding_service.py
│   ├── ingest_service.py
│   └── search_service.py
├── static/
│   ├── index.html
│   ├── styles.css
│   └── app.js
├── main.py
├── Dockerfile
├── docker-compose.yml
├── docker-compose.dev.yml
└── docker-compose.prod.yml

Running

cp .env.template .env
docker compose up -d

UI: http://localhost:8000
API docs: http://localhost:8000/docs
Qdrant dashboard: http://localhost:6333/dashboard

Before starting the app, create a .env file from .env.template and set your Hugging Face token.

Hugging Face token

Sign in to https://huggingface.co
Open Settings → Access Tokens
Create a new token with read access
Copy .env.template to .env
Replace HF_TOKEN=your_huggingface_token with your real token

Example:

cp .env.template .env

Then edit .env:

HF_TOKEN=hf_your_real_token_here

Development mode with hot-reload and source mounted:

docker compose -f docker-compose.dev.yml up --build

Production mode with 4 workers and resource limits:

docker compose -f docker-compose.prod.yml up -d --build

API

Method	Endpoint	Description
POST	`/search/text`	Search by text query
POST	`/search/image`	Search by image
POST	`/search/batch`	Batch text search
POST	`/ingest/image`	Upload a single image
POST	`/ingest/batch`	Upload multiple images (background)
GET	`/stats`	Collection stats
DELETE	`/images/{id}`	Delete an image

Examples

# Text search
curl -X POST http://localhost:8000/search/text \
  -H "Content-Type: application/json" \
  -d '{"query": "cat on a couch", "top_k": 5}'

# Upload image with tags
curl -X POST http://localhost:8000/ingest/image \
  -F "file=@photo.jpg" \
  -F 'metadata={"tags": ["cat", "indoor"]}'

# Image search with optional tag hints
curl -X POST "http://localhost:8000/search/image?top_k=5" \
  -F "file=@query.jpg" \
  -F "tags=cat,outdoor"

# Batch upload
curl -X POST http://localhost:8000/ingest/batch \
  -F "files=@image1.jpg" \
  -F "files=@image2.jpg" \
  -F "default_tags=vacation,2024"

Search ranking

Text and batch searches use Reciprocal Rank Fusion (RRF) to blend vector similarity with tag matching. Neither signal fully overrides the other. A strong vector match without tags still ranks above a weak vector match with a tag hit. Image searches support optional tag hints via the tags form field for the same RRF blend.

Environment variables

Variable	Default	Description
`QDRANT_HOST`	`localhost`	Qdrant hostname
`QDRANT_PORT`	`6333`	Qdrant port
`COLLECTION_NAME`	`images`	Collection name
`EMBEDDING_DIM`	`768`	Embedding dimension
`EMBEDDING_MODEL`	`hf-hub:timm/ViT-B-16-SigLIP`	OpenCLIP model identifier
`HF_TOKEN`	none	Hugging Face access token used for model downloads

Local development (without Docker)

pip install -r requirements.txt
docker run -p 6333:6333 qdrant/qdrant
python main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionText

Stack

Project structure

Running

Hugging Face token

API

Examples

Search ranking

Environment variables

Local development (without Docker)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
database		database
services		services
static		static
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

VisionText

Stack

Project structure

Running

Hugging Face token

API

Examples

Search ranking

Environment variables

Local development (without Docker)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages