Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,23 @@

All notable changes to Distill are documented here.

## [Unreleased]

### Added

- **Session-based context window management** (`pkg/session`) — Token-budgeted context windows for long-running agent sessions. Entries are deduplicated on push, compressed through hierarchical levels (full text → summary → sentence → keywords), and evicted when the budget is exceeded. Lowest-importance entries are compressed first. ([#38](https://github.com/Siddhant-K-code/distill/pull/38), closes [#31](https://github.com/Siddhant-K-code/distill/issues/31))
- **Session CLI** — `distill session create/push/context/delete` commands. ([#38](https://github.com/Siddhant-K-code/distill/pull/38))
- **Session HTTP API** — `/v1/session/create`, `/push`, `/context`, `/delete`, `/get` endpoints. Opt-in via `--session` flag. ([#38](https://github.com/Siddhant-K-code/distill/pull/38))
- **Session MCP tools** — `create_session`, `push_session`, `session_context`, `delete_session` for Claude Desktop, Cursor, and Amp. Opt-in via `--session` flag. ([#38](https://github.com/Siddhant-K-code/distill/pull/38))

### Stats

- 9 files changed, 1,928 insertions, 6 deletions
- 1 new package: `pkg/session`
- 13 new tests

---

## [v0.3.0] - 2026-02-23

Feature release adding persistent context memory, SSE streaming, OpenTelemetry tracing, and project documentation.
Expand Down
26 changes: 22 additions & 4 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,18 @@ LLMs are non-deterministic. The same input can produce different compressed outp

---

### What is Context Memory?

Persistent memory that accumulates knowledge across agent sessions. Store context once, recall it later by semantic similarity + recency. Memories are deduplicated on write and compressed over time through hierarchical decay (full text → summary → keywords → evicted). Enable with `--memory` on the `api` or `mcp` commands.

### What are Sessions?

Token-budgeted context windows for long-running agent tasks. Push context incrementally as the agent works - Distill deduplicates entries, compresses aging ones, and evicts when the budget is exceeded. The `preserve_recent` setting keeps the N most recent entries at full fidelity. Enable with `--session` on the `api` or `mcp` commands.

### How is Context Memory different from Sessions?

Memory is cross-session: knowledge persists after a session ends and can be recalled in future sessions. Sessions are within-task: a bounded context window that tracks what the agent has seen during a single task, enforcing a token budget. Use memory for long-term knowledge, sessions for working context.

## Algorithms

### Why agglomerative clustering instead of K-Means?
Expand Down Expand Up @@ -108,6 +120,10 @@ Yes. The HTTP API is framework-agnostic. MCP works with any MCP-compatible clien

LangChain's `search_type="mmr"` applies MMR at the vector DB level - a single re-ranking step. Distill runs a multi-stage pipeline: cache lookup, agglomerative clustering (groups similar chunks), representative selection (picks the best from each group), compression (reduces token count), then MMR (diversity re-ranking). The clustering step is the key difference - it understands group structure, not just pairwise similarity.

### What MCP tools does Distill expose?

The base MCP server exposes `deduplicate_context` and `analyze_redundancy`. With `--memory`, it adds `store_memory`, `recall_memory`, `forget_memory`, `memory_stats`. With `--session`, it adds `create_session`, `push_session`, `session_context`, `delete_session`. Enable both with `distill mcp --memory --session`.

### Can I use Distill with local models (Ollama, vLLM)?

The dedup pipeline itself doesn't call any LLM - it's pure math (cosine distance, clustering). The only external dependency is for embedding generation when you send text without pre-computed embeddings. Multi-provider embedding support (Ollama, Azure, Cohere, HuggingFace) is planned in [#33](https://github.com/Siddhant-K-code/distill/issues/33).
Expand Down Expand Up @@ -180,8 +196,10 @@ Yes, AGPL-3.0. The full pipeline, CLI, API server, MCP server, and all algorithm

### What's on the roadmap?

Three pillars:
**Shipped:**
- **Context Memory** - Persistent deduplicated memory across sessions with hierarchical decay ([#29](https://github.com/Siddhant-K-code/distill/issues/29))
- **Session Management** - Token-budgeted context windows with compression and eviction ([#31](https://github.com/Siddhant-K-code/distill/issues/31))

1. **Context Memory** - Persistent deduplicated memory across agent sessions with hierarchical decay ([#29](https://github.com/Siddhant-K-code/distill/issues/29), [#31](https://github.com/Siddhant-K-code/distill/issues/31))
2. **Code Intelligence** - Dependency graphs, co-change patterns, blast radius analysis ([#30](https://github.com/Siddhant-K-code/distill/issues/30), [#32](https://github.com/Siddhant-K-code/distill/issues/32))
3. **Platform** - Python SDK, multi-provider embeddings, batch API ([#5](https://github.com/Siddhant-K-code/distill/issues/5), [#33](https://github.com/Siddhant-K-code/distill/issues/33), [#11](https://github.com/Siddhant-K-code/distill/issues/11))
**Upcoming:**
1. **Code Intelligence** - Dependency graphs, co-change patterns, blast radius analysis ([#30](https://github.com/Siddhant-K-code/distill/issues/30), [#32](https://github.com/Siddhant-K-code/distill/issues/32))
2. **Platform** - Python SDK, multi-provider embeddings, batch API ([#5](https://github.com/Siddhant-K-code/distill/issues/5), [#33](https://github.com/Siddhant-K-code/distill/issues/33), [#11](https://github.com/Siddhant-K-code/distill/issues/11))
107 changes: 104 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,11 @@ curl -X POST http://localhost:8080/v1/retrieve \
Works with Claude, Cursor, Amp, and other MCP-compatible assistants:

```bash
# Dedup only
distill mcp

# With memory and sessions
distill mcp --memory --session
```

Add to Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_config.json`):
Expand All @@ -193,7 +197,10 @@ Add to Claude Desktop (`~/Library/Application Support/Claude/claude_desktop_conf
"mcpServers": {
"distill": {
"command": "/path/to/distill",
"args": ["mcp"]
"args": ["mcp", "--memory", "--session"],
"env": {
"OPENAI_API_KEY": "your-key"
}
}
}
}
Expand Down Expand Up @@ -270,13 +277,85 @@ memory:
dedup_threshold: 0.15
```

## Session Management

Token-budgeted context windows for long-running agent sessions. Push context incrementally - Distill deduplicates, compresses aging entries, and evicts when the budget is exceeded.

Enable with the `--session` flag on `api` or `mcp` commands.

### CLI

```bash
# Create a session with 128K token budget
distill session create --session-id task-42 --max-tokens 128000

# Push context as the agent works
distill session push --session-id task-42 --role user --content "Fix the JWT validation bug"
distill session push --session-id task-42 --role tool --content "$(cat auth/jwt.go)" --source file_read --importance 0.8

# Read the current context window
distill session context --session-id task-42

# Clean up when done
distill session delete --session-id task-42
```

### API

```bash
# Start API with sessions enabled
distill api --port 8080 --session

# Create session
curl -X POST http://localhost:8080/v1/session/create \
-H "Content-Type: application/json" \
-d '{"session_id": "task-42", "max_tokens": 128000}'

# Push entries
curl -X POST http://localhost:8080/v1/session/push \
-H "Content-Type: application/json" \
-d '{
"session_id": "task-42",
"entries": [
{"role": "tool", "content": "file contents...", "source": "file_read", "importance": 0.8}
]
}'

# Read context window
curl -X POST http://localhost:8080/v1/session/context \
-H "Content-Type: application/json" \
-d '{"session_id": "task-42"}'
```

### MCP

Session tools are available when `--session` is enabled:

```bash
distill mcp --session
```

Tools exposed: `create_session`, `push_session`, `session_context`, `delete_session`.

### How Budget Enforcement Works

When a push exceeds the token budget:

1. **Compress** oldest entries (outside the `preserve_recent` window) through levels:
- Full text → Summary (~20%) → Single sentence (~5%) → Keywords (~1%)
2. **Evict** entries that are already at keyword level
3. Lowest-importance entries are compressed/evicted first

The `preserve_recent` setting (default: 10) keeps the most recent entries at full fidelity.

## CLI Commands

```bash
distill api # Start standalone API server
distill serve # Start server with vector DB connection
distill mcp # Start MCP server for AI assistants
distill memory # Store, recall, and manage persistent context memories
distill session # Manage token-budgeted context windows for agent sessions
distill analyze # Analyze a file for duplicates
distill sync # Upload vectors to Pinecone with dedup
distill query # Test a query from command line
Expand All @@ -294,6 +373,11 @@ distill config # Manage configuration files
| POST | `/v1/memory/recall` | Recall memories by relevance + recency (requires `--memory`) |
| POST | `/v1/memory/forget` | Remove memories by ID, tag, or age (requires `--memory`) |
| GET | `/v1/memory/stats` | Memory store statistics (requires `--memory`) |
| POST | `/v1/session/create` | Create a session with token budget (requires `--session`) |
| POST | `/v1/session/push` | Push entries with dedup + budget enforcement (requires `--session`) |
| POST | `/v1/session/context` | Read current context window (requires `--session`) |
| POST | `/v1/session/delete` | Delete a session (requires `--session`) |
| GET | `/v1/session/get` | Get session metadata (requires `--session`) |
| GET | `/health` | Health check |
| GET | `/metrics` | Prometheus metrics |

Expand Down Expand Up @@ -345,6 +429,15 @@ retriever:
auth:
api_keys:
- ${DISTILL_API_KEY}

memory:
db_path: distill-memory.db
dedup_threshold: 0.15

session:
db_path: distill-sessions.db
dedup_threshold: 0.15
max_tokens: 128000
```

Environment variables can be referenced using `${VAR}` or `${VAR:-default}` syntax.
Expand Down Expand Up @@ -537,6 +630,14 @@ Reduces token count while preserving meaning. Three strategies:

Strategies can be chained via `compress.Pipeline`. Configure with target reduction ratio (e.g., 0.3 = keep 30% of original).

### Memory (`pkg/memory`)

Persistent context memory across agent sessions. SQLite-backed with write-time deduplication via cosine similarity. Memories decay over time: full text → summary → keywords → evicted. Recall ranked by `(1-w)*similarity + w*recency`. Enable with `--memory` flag.

### Session (`pkg/session`)

Token-budgeted context windows for long-running tasks. Entries are deduplicated on push, compressed through hierarchical levels when the budget is exceeded, and evicted by importance. The `preserve_recent` setting keeps the N most recent entries at full fidelity. Enable with `--session` flag.

### Cache (`pkg/cache`)

KV cache for repeated context patterns (system prompts, tool definitions, boilerplate). Sub-millisecond retrieval for cache hits.
Expand Down Expand Up @@ -566,7 +667,7 @@ KV cache for repeated context patterns (system prompts, tool definitions, boiler
│ Context Intelligence │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────────┐ │
│ │ Memory Store │ │ Impact Graph │ │ Session Context Windows │ │
│ │ (shipped) │ │ (#30) │ │ (#31) │ │
│ │ (shipped) │ │ (#30) │ │ (shipped) │ │
│ └──────────────┘ └──────────────┘ └──────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
Expand Down Expand Up @@ -604,7 +705,7 @@ Distill is evolving from a dedup utility into a context intelligence layer. Here
| Feature | Issue | Status | Description |
|---------|-------|--------|-------------|
| **Context Memory Store** | [#29](https://github.com/Siddhant-K-code/distill/issues/29) | Shipped | Persistent, deduplicated memory across sessions. Write-time dedup, hierarchical decay, token-budgeted recall. See [Context Memory](#context-memory). |
| **Session Management** | [#31](https://github.com/Siddhant-K-code/distill/issues/31) | Planned | Stateful context windows for long-running agents. Push context incrementally, Distill keeps it deduplicated and within budget. |
| **Session Management** | [#31](https://github.com/Siddhant-K-code/distill/issues/31) | Shipped | Stateful context windows with token budgets, hierarchical compression, and importance-based eviction. See [Session Management](#session-management). |

### Code Intelligence

Expand Down
21 changes: 21 additions & 0 deletions cmd/api.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,8 @@ func init() {
apiCmd.Flags().String("embedding-model", "text-embedding-3-small", "OpenAI embedding model")
apiCmd.Flags().String("api-keys", "", "Comma-separated list of valid API keys (or use DISTILL_API_KEYS)")
apiCmd.Flags().Bool("memory", false, "Enable persistent memory store")
apiCmd.Flags().Bool("session", false, "Enable session management")
apiCmd.Flags().String("session-db", "distill-sessions.db", "SQLite database path for session store")

// Bind to viper for config file support
_ = viper.BindPFlag("server.port", apiCmd.Flags().Lookup("port"))
Expand Down Expand Up @@ -199,6 +201,24 @@ func runAPI(cmd *cobra.Command, args []string) error {
memAPI := &MemoryAPI{store: memStore, embedder: embedder}
memAPI.RegisterMemoryRoutes(mux, m.Middleware)
}

// Setup session store (opt-in)
enableSession, _ := cmd.Flags().GetBool("session")
if enableSession {
sessDBPath, _ := cmd.Flags().GetString("session-db")
if sessDBPath == "" {
sessDBPath = "distill-sessions.db"
}
sessStore, err := newSessionStore(sessDBPath)
if err != nil {
return fmt.Errorf("failed to create session store: %w", err)
}
defer func() { _ = sessStore.Close() }()

sessAPI := &SessionAPI{store: sessStore}
sessAPI.RegisterSessionRoutes(mux, m.Middleware)
}

mux.HandleFunc("/health", server.handleHealth)
mux.HandleFunc("/metrics", func(w http.ResponseWriter, r *http.Request) {
m.Handler().ServeHTTP(w, r)
Expand Down Expand Up @@ -241,6 +261,7 @@ func runAPI(cmd *cobra.Command, args []string) error {
fmt.Printf(" Embeddings: %v\n", embedder != nil)
fmt.Printf(" Auth: %v (%d keys)\n", server.hasAuth, len(validKeys))
fmt.Printf(" Memory: %v\n", enableMemory)
fmt.Printf(" Sessions: %v\n", enableSession)
fmt.Println()
fmt.Println("Endpoints:")
fmt.Printf(" POST http://%s/v1/dedupe\n", addr)
Expand Down
Loading