diff --git a/BENCHMARKS.md b/BENCHMARKS.md index f4500dd..162940a 100644 --- a/BENCHMARKS.md +++ b/BENCHMARKS.md @@ -8,22 +8,22 @@ Measured on Apple M3 Max, 36GB RAM. | Metric | Sediment | ChromaDB | Mem0 | |--------|----------|----------|------| -| Recall@1 | 45.0% | 47.0% | 47.0% | -| Recall@3 | **69.0%** | 69.0% | 69.0% | -| Recall@5 | 78.0% | 78.5% | 78.5% | -| Recall@10 | **90.5%** | 90.0% | 90.0% | -| MRR | 59.0% | 60.8% | 60.8% | +| Recall@1 | **50.0%** | 47.0% | 47.0% | +| Recall@3 | 69.0% | 69.0% | 69.0% | +| Recall@5 | 77.5% | 78.5% | 78.5% | +| Recall@10 | 89.5% | 90.0% | 90.0% | +| MRR | **61.9%** | 60.8% | 60.8% | | Recency@1 | **100%** | 14% | 14% | | Consolidation | **99%** | 0% | 0% | -| Store p50 | 22ms | 696ms | 16ms | -| Recall p50 | 26ms | 694ms | 8ms | +| Store p50 | 49ms | 696ms | 16ms | +| Recall p50 | 103ms | 694ms | 8ms | ## Key takeaways -- **Retrieval quality**: Within 0.5pp of ChromaDB on Recall@5 (78.0% vs 78.5%), matching on Recall@3 +- **Retrieval quality**: Best R@1 (50.0%) and MRR (61.9%) — top result is correct more often than alternatives - **Temporal correctness**: 100% Recency@1 — updated memories always rank first. Others: 14% - **Deduplication**: 99% consolidation rate — near-duplicates auto-merged. Others: 0% -- **Latency**: 32x faster store than ChromaDB (22ms vs 696ms). All operations local, no network +- **Latency**: 14x faster store than ChromaDB (49ms vs 696ms). All operations local, no network ## Methodology