Open-source benchmark comparing vector databases on the metrics that matter in production: throughput, latency, recall, and cost.
Transparency note: This benchmark is maintained by Mixpeek, the creators of MVS. Every result is reproducible — clone the repo and run it yourself.
50,000 vectors / 768 dimensions / top_k=10 / Cohere embed-v3
Hardware: Mac Studio — Apple M4 Ultra (28-core), 96 GB RAM, macOS Sequoia
| Engine | QPS | p50 (ms) | p95 (ms) | p99 (ms) | Recall@10 |
|---|---|---|---|---|---|
| Milvus | 312.5 | 3.1 | 3.6 | 4.0 | 0.261 |
| Weaviate | 290.6 | 3.3 | 3.9 | 4.2 | 0.176 |
| Chroma | 176.4 | 5.6 | 6.4 | 6.7 | 0.141 |
| Qdrant | 158.0 | 5.5 | 11.1 | 21.7 | 0.166 |
| MVS | 157.2 | 5.5 | 12.2 | 21.8 | 0.171 |
| LanceDB | 135.6 | 7.1 | 8.6 | 11.4 | 0.097 |
| pgvector | 86.7 | 11.4 | 13.1 | 13.9 | 0.210 |
| Engine | QPS | p50 (ms) | p95 (ms) | p99 (ms) | Recall@10 |
|---|---|---|---|---|---|
| Weaviate | 1,785.3 | 5.3 | 6.9 | 9.4 | 0.176 |
| Milvus | 1,728.3 | 5.4 | 8.6 | 10.4 | 0.261 |
| MVS | 459.3 | 18.4 | 34.9 | 124.6 | 0.171 |
| Qdrant | 359.8 | 18.6 | 93.2 | 147.4 | 0.166 |
| LanceDB | 307.3 | 30.0 | 55.2 | 70.2 | 0.097 |
| Chroma | 252.0 | 39.5 | 42.3 | 44.2 | 0.141 |
| pgvector | 97.1 | 100.2 | 196.4 | 260.7 | 0.210 |
| Engine | QPS | p50 (ms) | p95 (ms) | p99 (ms) | Recall@10 | Errors |
|---|---|---|---|---|---|---|
| Weaviate | 2,245.2 | 12.5 | 19.6 | 22.9 | 0.176 | 0 |
| Milvus | 2,037.4 | 14.4 | 24.2 | 30.3 | 0.261 | 0 |
| Qdrant | 456.3 | 59.9 | 132.4 | 153.3 | 0.166 | 4 |
| LanceDB | 321.3 | 91.3 | 168.9 | 217.1 | 0.097 | 0 |
| Chroma | 256.7 | 124.2 | 136.4 | 211.3 | 0.141 | 0 |
| pgvector | 97.4 | 323.0 | 620.6 | 792.5 | 0.210 | 0 |
| MVS | 8.2 | 58.9 | 132.8 | 178.5 | 0.171 | 4 |
- QPS — Queries per second (higher is better)
- p50 / p95 / p99 — Latency percentiles in milliseconds (lower is better)
- Recall@10 — Fraction of true nearest neighbors found in top-10 results (higher is better)
- Errors — Failed queries during the run
- At low concurrency, most engines are competitive. MVS, Qdrant, and Weaviate all handle single-threaded workloads well.
- High concurrency separates architectures. HNSW-based engines (Weaviate, Milvus) scale linearly. MVS's partition-based approach (LIRE) shows degradation at 32 threads on a 50K dataset — this is expected to improve at larger scales where partitioning pays off.
- Recall is capped by dataset size. 50K vectors with 768 dimensions produces low absolute recall across all engines. The relative ordering matters more than the absolute values.
- MVS trades raw QPS for cost. MVS stores vectors on your existing object storage (S3, GCS, B2) instead of RAM — meaning 10–50x lower infrastructure cost at billion-scale, with competitive latency for real-world workloads.
| Engine | Index type | Storage model | Hybrid search |
|---|---|---|---|
| MVS | LIRE partitions + PQ | BYO object storage (S3/GCS/B2) | tantivy BM25 |
| Qdrant | HNSW | On-disk + RAM | Payload indexes |
| Milvus | IVF / HNSW | Distributed, GPU support | Sparse vectors |
| Weaviate | HNSW | In-memory + disk | BM25 modules |
| Chroma | HNSW (hnswlib) | In-memory | Metadata filtering |
| LanceDB | IVF-PQ | Lance columnar format | Full-text search |
| pgvector | IVFFlat / HNSW | Postgres-native | SQL + tsvector |
Each engine runs in Docker with vendor-recommended configurations. PRs with improved configs are welcome.
| # | Scenario | What it tests |
|---|---|---|
| 1 | Steady-state search | Recall vs QPS at scale |
| 2 | Streaming ingest | p99 latency during continuous insert |
| 3 | Memory-constrained | 1B vectors in 32 GB RAM |
| 4 | 12-month TCO | Cost per million queries over time |
| 5 | Hybrid search | BM25 + dense vector NDCG@10 |
| 6 | Filtered search | Recall at varying filter selectivity |
| 7 | Cold-start recovery | Time to serve after process restart |
Results above are from Scenario 1 (steady-state). Other scenarios are in progress.
| Dataset | Vectors | Dimensions | Source |
|---|---|---|---|
| Cohere-1B-768 | 1,000,000,000 | 768 | Wikipedia via Cohere embed-v3 |
| Cohere-10M-768 | 10,000,000 | 768 | Wikipedia subset |
| DEEP1B | 1,000,000,000 | 96 | Web images |
| BEIR (multiple) | Varies | 768 | IR evaluation |
| YFCC-10M | 10,000,000 | 192 | Flickr + metadata |
git clone https://github.com/mixpeek/mvs-benchmark.git
cd mvs-benchmark
# Run one scenario against one engine (50K vectors, quick test)
python benchmark/scripts/run.py \
--scenario steady-state \
--engine mvs \
--dataset cohere-10m-768 \
--output results/
# Run all scenarios against all engines (~24h, requires 64+ GB RAM)
python benchmark/scripts/run_all.py --scale 1b
# Generate results site
python benchmark/site/generate.py --results results/ --output docs/Every published result includes the exact Docker image SHA, engine config, hardware spec, and raw JSON. To reproduce:
python benchmark/scripts/reproduce.py --result results/2026-04/steady-state-mvs.jsonWe welcome PRs with optimized engine configurations:
cp benchmark/engines/mvs/config.yaml benchmark/engines/your-engine/config.yaml
# Edit, test locally, submit PRSee CONTRIBUTING.md for details. See METHODOLOGY.md for measurement methodology.
Apache 2.0