Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
88 changes: 81 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

**Context intelligence layer for AI agents.**

Deduplicates, compresses, and manages context across sessions - so your agents produce reliable, deterministic outputs. Today: a dedup pipeline with ~12ms overhead. Next: persistent context memory, code change impact graphs, and session-aware context windows.
Deduplicates, compresses, and manages context across sessions - so your agents produce reliable, deterministic outputs. Includes a dedup pipeline with ~12ms overhead and persistent context memory with write-time dedup and hierarchical decay.

Less redundant data. Lower costs. Faster responses. Deterministic results.

Expand Down Expand Up @@ -201,12 +201,82 @@ Add to Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_conf

See [mcp/README.md](mcp/README.md) for more configuration options.

## Context Memory

Persistent memory that accumulates knowledge across agent sessions. Memories are deduplicated on write, ranked by relevance + recency on recall, and compressed over time through hierarchical decay.

Enable with the `--memory` flag on `api` or `mcp` commands.

### CLI

```bash
# Store a memory
distill memory store --text "Auth uses JWT with RS256 signing" --tags auth --source docs

# Recall relevant memories
distill memory recall --query "How does authentication work?" --max-results 5

# Remove outdated memories
distill memory forget --tags deprecated

# View statistics
distill memory stats
```

### API

```bash
# Start API with memory enabled
distill api --port 8080 --memory

# Store
curl -X POST http://localhost:8080/v1/memory/store \
-H "Content-Type: application/json" \
-d '{
"session_id": "session-1",
"entries": [{"text": "Auth uses JWT with RS256", "tags": ["auth"], "source": "docs"}]
}'

# Recall
curl -X POST http://localhost:8080/v1/memory/recall \
-H "Content-Type: application/json" \
-d '{"query": "How does auth work?", "max_results": 5}'
```

### MCP

Memory tools are available in Claude Desktop, Cursor, and other MCP clients when `--memory` is enabled:

```bash
distill mcp --memory
```

Tools exposed: `store_memory`, `recall_memory`, `forget_memory`, `memory_stats`.

### How Decay Works

Memories compress over time based on access patterns:

```
Full text → Summary (~20%) → Keywords (~5%) → Evicted
(24h) (7 days) (30 days)
```

Accessing a memory resets its decay clock. Configure ages via `distill.yaml`:

```yaml
memory:
db_path: distill-memory.db
dedup_threshold: 0.15
```

## CLI Commands

```bash
distill api # Start standalone API server
distill serve # Start server with vector DB connection
distill mcp # Start MCP server for AI assistants
distill memory # Store, recall, and manage persistent context memories
distill analyze # Analyze a file for duplicates
distill sync # Upload vectors to Pinecone with dedup
distill query # Test a query from command line
Expand All @@ -220,6 +290,10 @@ distill config # Manage configuration files
| POST | `/v1/dedupe` | Deduplicate chunks |
| POST | `/v1/dedupe/stream` | SSE streaming dedup with per-stage progress |
| POST | `/v1/retrieve` | Query vector DB with dedup (requires backend) |
| POST | `/v1/memory/store` | Store memories with write-time dedup (requires `--memory`) |
| POST | `/v1/memory/recall` | Recall memories by relevance + recency (requires `--memory`) |
| POST | `/v1/memory/forget` | Remove memories by ID, tag, or age (requires `--memory`) |
| GET | `/v1/memory/stats` | Memory store statistics (requires `--memory`) |
| GET | `/health` | Health check |
| GET | `/metrics` | Prometheus metrics |

Expand Down Expand Up @@ -489,10 +563,10 @@ KV cache for repeated context patterns (system prompts, tool definitions, boiler
│ └─────────┘ └─────────┘ └─────────┘ └──────────┘ └─────────┘ │
│ <1ms 6ms <1ms 2ms 3ms │
│ │
│ Context Intelligence (planned)
│ Context Intelligence
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ Memory Store │ │ Impact Graph │ │ Session Context Windows │ │
│ │ (#29) │ │ (#30) │ │ (#31) │ │
│ │ (shipped) │ │ (#30) │ │ (#31) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
Expand Down Expand Up @@ -527,10 +601,10 @@ Distill is evolving from a dedup utility into a context intelligence layer. Here

### Context Memory

| Feature | Issue | Description |
|---------|-------|-------------|
| **Context Memory Store** | [#29](https://github.com/Siddhant-K-code/distill/issues/29) | Persistent, deduplicated memory across sessions. Write-time dedup, hierarchical decay (full text -> summary -> keywords -> evicted), token-budgeted recall. |
| **Session Management** | [#31](https://github.com/Siddhant-K-code/distill/issues/31) | Stateful context windows for long-running agents. Push context incrementally, Distill keeps it deduplicated and within budget. |
| Feature | Issue | Status | Description |
|---------|-------|--------|-------------|
| **Context Memory Store** | [#29](https://github.com/Siddhant-K-code/distill/issues/29) | Shipped | Persistent, deduplicated memory across sessions. Write-time dedup, hierarchical decay, token-budgeted recall. See [Context Memory](#context-memory). |
| **Session Management** | [#31](https://github.com/Siddhant-K-code/distill/issues/31) | Planned | Stateful context windows for long-running agents. Push context incrementally, Distill keeps it deduplicated and within budget. |

### Code Intelligence

Expand Down
23 changes: 23 additions & 0 deletions cmd/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ func init() {
apiCmd.Flags().String("openai-key", "", "OpenAI API key for embeddings (or use OPENAI_API_KEY)")
apiCmd.Flags().String("embedding-model", "text-embedding-3-small", "OpenAI embedding model")
apiCmd.Flags().String("api-keys", "", "Comma-separated list of valid API keys (or use DISTILL_API_KEYS)")
apiCmd.Flags().Bool("memory", false, "Enable persistent memory store")

// Bind to viper for config file support
_ = viper.BindPFlag("server.port", apiCmd.Flags().Lookup("port"))
Expand Down Expand Up @@ -177,6 +178,27 @@ func runAPI(cmd *cobra.Command, args []string) error {
mux := http.NewServeMux()
mux.HandleFunc("/v1/dedupe", m.Middleware("/v1/dedupe", server.handleDedupe))
mux.HandleFunc("/v1/dedupe/stream", m.Middleware("/v1/dedupe/stream", server.handleDedupeStream))

// Setup memory store (opt-in)
enableMemory, _ := cmd.Flags().GetBool("memory")
if enableMemory {
memDBPath := viper.GetString("memory.db_path")
if memDBPath == "" {
memDBPath = "distill-memory.db"
}
memThreshold := viper.GetFloat64("memory.dedup_threshold")
if memThreshold == 0 {
memThreshold = 0.15
}
memStore, err := memoryStoreFromConfig(memDBPath, memThreshold)
if err != nil {
return fmt.Errorf("failed to create memory store: %w", err)
}
defer func() { _ = memStore.Close() }()

memAPI := &MemoryAPI{store: memStore, embedder: embedder}
memAPI.RegisterMemoryRoutes(mux, m.Middleware)
}
mux.HandleFunc("/health", server.handleHealth)
mux.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
m.Handler().ServeHTTP(w, r)
Expand Down Expand Up @@ -218,6 +240,7 @@ func runAPI(cmd *cobra.Command, args []string) error {
fmt.Printf("Distill API server starting on %s\n", addr)
fmt.Printf(" Embeddings: %v\n", embedder != nil)
fmt.Printf(" Auth: %v (%d keys)\n", server.hasAuth, len(validKeys))
fmt.Printf(" Memory: %v\n", enableMemory)
fmt.Println()
fmt.Println("Endpoints:")
fmt.Printf(" POST http://%s/v1/dedupe\n", addr)
Expand Down
155 changes: 155 additions & 0 deletions cmd/api_memory.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
package cmd

import (
"context"
"encoding/json"
"fmt"
"net/http"
"time"

"github.com/Siddhant-K-code/distill/pkg/memory"
"github.com/Siddhant-K-code/distill/pkg/retriever"
)

// MemoryAPI handles memory-related HTTP endpoints.
type MemoryAPI struct {
store *memory.SQLiteStore
embedder retriever.EmbeddingProvider
}

// RegisterMemoryRoutes adds memory endpoints to the given mux.
func (m *MemoryAPI) RegisterMemoryRoutes(mux *http.ServeMux, mw func(string, http.HandlerFunc) http.HandlerFunc) {
mux.HandleFunc("/v1/memory/store", mw("/v1/memory/store", m.handleStore))
mux.HandleFunc("/v1/memory/recall", mw("/v1/memory/recall", m.handleRecall))
mux.HandleFunc("/v1/memory/forget", mw("/v1/memory/forget", m.handleForget))
mux.HandleFunc("/v1/memory/stats", mw("/v1/memory/stats", m.handleStats))
}

func (m *MemoryAPI) handleStore(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
return
}

var req memory.StoreRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeJSONError(w, "invalid request body", http.StatusBadRequest)
return
}

// Generate embeddings for entries that don't have them
if m.embedder != nil {
var textsToEmbed []string
var indices []int
for i, e := range req.Entries {
if len(e.Embedding) == 0 && e.Text != "" {
textsToEmbed = append(textsToEmbed, e.Text)
indices = append(indices, i)
}
}
if len(textsToEmbed) > 0 {
ctx, cancel := context.WithTimeout(r.Context(), 30*time.Second)
defer cancel()
embeddings, err := m.embedder.EmbedBatch(ctx, textsToEmbed)
if err != nil {
writeJSONError(w, fmt.Sprintf("embedding error: %v", err), http.StatusInternalServerError)
return
}
for i, idx := range indices {
req.Entries[idx].Embedding = embeddings[i]
}
}
}

result, err := m.store.Store(r.Context(), req)
if err != nil {
writeJSONError(w, err.Error(), http.StatusInternalServerError)
return
}

w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(result)
}

func (m *MemoryAPI) handleRecall(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
return
}

var req memory.RecallRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeJSONError(w, "invalid request body", http.StatusBadRequest)
return
}

if req.Query == "" && len(req.QueryEmbedding) == 0 {
writeJSONError(w, "query or query_embedding is required", http.StatusBadRequest)
return
}

// Generate query embedding if not provided
if len(req.QueryEmbedding) == 0 && m.embedder != nil && req.Query != "" {
ctx, cancel := context.WithTimeout(r.Context(), 10*time.Second)
defer cancel()
emb, err := m.embedder.Embed(ctx, req.Query)
if err != nil {
writeJSONError(w, fmt.Sprintf("embedding error: %v", err), http.StatusInternalServerError)
return
}
req.QueryEmbedding = emb
}

result, err := m.store.Recall(r.Context(), req)
if err != nil {
writeJSONError(w, err.Error(), http.StatusInternalServerError)
return
}

w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(result)
}

func (m *MemoryAPI) handleForget(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodDelete && r.Method != http.MethodPost {
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
return
}

var req memory.ForgetRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeJSONError(w, "invalid request body", http.StatusBadRequest)
return
}

result, err := m.store.Forget(r.Context(), req)
if err != nil {
writeJSONError(w, err.Error(), http.StatusInternalServerError)
return
}

w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(result)
}

func (m *MemoryAPI) handleStats(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
http.Error(w, "method not allowed", http.StatusMethodNotAllowed)
return
}

stats, err := m.store.Stats(r.Context())
if err != nil {
writeJSONError(w, err.Error(), http.StatusInternalServerError)
return
}

w.Header().Set("Content-Type", "application/json")
_ = json.NewEncoder(w).Encode(stats)
}

func writeJSONError(w http.ResponseWriter, msg string, code int) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(code)
_ = json.NewEncoder(w).Encode(map[string]string{"error": msg})
}
Loading