diff --git a/BENCHMARKS.md b/BENCHMARKS.md index 162940a..27572dd 100644 --- a/BENCHMARKS.md +++ b/BENCHMARKS.md @@ -13,6 +13,7 @@ Measured on Apple M3 Max, 36GB RAM. | Recall@5 | 77.5% | 78.5% | 78.5% | | Recall@10 | 89.5% | 90.0% | 90.0% | | MRR | **61.9%** | 60.8% | 60.8% | +| nDCG@5 | 58.7% | 59.9% | 59.9% | | Recency@1 | **100%** | 14% | 14% | | Consolidation | **99%** | 0% | 0% | | Store p50 | 49ms | 696ms | 16ms | @@ -25,6 +26,57 @@ Measured on Apple M3 Max, 36GB RAM. - **Deduplication**: 99% consolidation rate — near-duplicates auto-merged. Others: 0% - **Latency**: 14x faster store than ChromaDB (49ms vs 696ms). All operations local, no network +## Category breakdown + +### Recall@5 by category + +| Category | Sediment | ChromaDB | Mem0 | +|----------|----------|----------|------| +| `architecture` | **82.9%** | 71.4% | 71.4% | +| `code_patterns` | **88.6%** | **88.6%** | **88.6%** | +| `cross_project` | 65.6% | **68.8%** | **68.8%** | +| `project_facts` | 60.6% | **75.8%** | **75.8%** | +| `troubleshooting` | 78.1% | **81.2%** | **81.2%** | +| `user_preferences` | **87.9%** | 84.9% | 84.9% | + +### MRR by category + +| Category | Sediment | ChromaDB | Mem0 | +|----------|----------|----------|------| +| `architecture` | **66.8%** | 55.8% | 55.8% | +| `code_patterns` | 70.4% | **71.1%** | **71.1%** | +| `cross_project` | **50.7%** | 47.1% | 47.1% | +| `project_facts` | 51.6% | **59.2%** | **59.2%** | +| `troubleshooting` | **63.2%** | 62.8% | 62.8% | +| `user_preferences` | 67.6% | **67.9%** | **67.9%** | + +## Temporal correctness + +| Metric | Sediment | ChromaDB | Mem0 | +|--------|----------|----------|------| +| Recency@1 | **100%** | 14% | 14% | +| Recency@3 | **100%** | 94% | 94% | +| MRR | **100%** | 48.8% | 48.8% | +| Mean Rank | **1.00** | 2.38 | 2.38 | + +## Latency + +### Store latency + +| Metric | Sediment | ChromaDB | Mem0 | +|--------|----------|----------|------| +| p50 | 49ms | 696ms | **16ms** | +| p95 | 62ms | 726ms | **19ms** | +| p99 | 88ms | 729ms | **20ms** | + +### Recall latency + +| Metric | Sediment | ChromaDB | Mem0 | +|--------|----------|----------|------| +| p50 | 103ms | 694ms | **8ms** | +| p95 | 109ms | 728ms | **12ms** | +| p99 | 132ms | 746ms | **12ms** | + ## Methodology - **Dataset**: 1,000 memories across 6 categories (architecture, code patterns, project facts, troubleshooting, user preferences, cross-project) diff --git a/CLAUDE.md b/CLAUDE.md index 83fd0dd..f74d2e9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -39,13 +39,17 @@ Sediment is a semantic memory system for AI agents, running as an MCP (Model Con ### Core Components - **`src/main.rs`** - CLI entry point with subcommands (init, stats, list) and MCP server startup -- **`src/lib.rs`** - Library root exposing public API, project detection, and scope types -- **`src/db.rs`** - LanceDB wrapper handling vector storage, search, and CRUD operations +- **`src/lib.rs`** - Library root exposing public API, project detection, scope types, and project ID migration +- **`src/db.rs`** - LanceDB wrapper handling vector storage, hybrid search (vector + FTS/BM25), and CRUD operations - **`src/embedder.rs`** - Local embeddings using `all-MiniLM-L6-v2` via Candle (384-dim vectors) - **`src/chunker.rs`** - Smart content chunking by type (markdown, code, JSON, YAML, text) +- **`src/document.rs`** - ContentType enum for routing content to the appropriate chunker +- **`src/item.rs`** - Unified Item, Chunk, SearchResult, StoreResult, and ConflictInfo types - **`src/access.rs`** - SQLite-based access tracking, validation counting, and memory decay scoring - **`src/graph.rs`** - SQLite graph store: relationship tracking (RELATED, SUPERSEDES, CO_ACCESSED, CLUSTER_SIBLING edges) - **`src/consolidation.rs`** - Background consolidation: auto-merging near-duplicates, linking similar items +- **`src/error.rs`** - SedimentError enum with typed error variants (Database, Embedding, Arrow, etc.) +- **`src/retry.rs`** - Retry utilities with exponential backoff (3 attempts, 100ms–2s) ### MCP Server (`src/mcp/`) @@ -67,7 +71,8 @@ Sediment is a semantic memory system for AI agents, running as an MCP (Model Con - **Two-database hybrid**: LanceDB for vectors, SQLite for graph relationships + mutable counters - **Single central database** at `~/.sediment/data/` stores all projects; graph + access at `~/.sediment/access.db` - **Project scoping** via UUID stored in `.sediment/config` per project -- **Similarity boosting**: Same-project items get 1.15x boost, different projects 0.95x penalty +- **Similarity boosting**: Same-project items unchanged, different projects get 0.875x penalty (12.5% spread) +- **Hybrid search**: Vector similarity combined with FTS/BM25 scoring. BM25 boost is additive (max 0.12, power-law gamma 2.0). FTS index rebuilt on each store - **Conflict detection**: Items with >=0.85 similarity flagged on store and enqueued for consolidation - **Fresh DB connection per tool call** with shared embedder for efficiency - **Memory decay scoring**: Recall results re-ranked using freshness (hyperbolic decay, 0.5 at 30 days) and access frequency (log-scaled). Tracked in SQLite sidecar since LanceDB is append-oriented. @@ -126,6 +131,8 @@ CREATE TABLE graph_edges ( created_at INTEGER NOT NULL, UNIQUE(from_id, to_id, edge_type) ); +CREATE INDEX idx_edges_from ON graph_edges(from_id, edge_type); +CREATE INDEX idx_edges_to ON graph_edges(to_id, edge_type); -- Access tracking and decay scoring CREATE TABLE access_log ( diff --git a/README.md b/README.md index 79b4267..3e0224b 100644 --- a/README.md +++ b/README.md @@ -160,7 +160,8 @@ All local, embedded, zero config: - **Memory decay**: Results re-ranked by freshness (30-day half-life) and access frequency. Old memories rank lower but are never auto-deleted. - **Trust-weighted scoring**: Validated and well-connected memories score higher. -- **Project scoping**: Automatic context isolation between projects. Same-project items get a similarity boost. +- **Hybrid search**: Vector similarity combined with FTS/BM25 scoring for better retrieval quality. +- **Project scoping**: Automatic context isolation between projects. Different-project items receive a similarity penalty. - **Relationship graph**: Items linked via RELATED, SUPERSEDES, and CO_ACCESSED edges. Recall expands results with 1-hop graph neighbors and co-access suggestions. - **Background consolidation**: Near-duplicates (≥0.95 similarity) auto-merged; similar items (0.85–0.95) linked. - **Type-aware chunking**: Intelligent splitting for markdown, code, JSON, YAML, and plain text.