Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
168 changes: 168 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# ARCHITECTURE.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 H1 heading includes file extension

# ARCHITECTURE.md as a rendered heading looks odd (the .md extension is typically omitted). Compare with AGENTS.md which uses a plain prose heading. Consider:

Suggested change
# ARCHITECTURE.md
# Architecture

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!


> wagl — local-first agent memory backed by libSQL

## System Overview

```
┌─────────────────────────────────────────────────────┐
│ Agent / Human │
│ (OpenClaw, ChatGPT, Claude, CLI) │
└──────┬──────────┬──────────┬───────────┬────────────┘
│ │ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐ ┌──────────┐
│ CLI │ │ Server │ │ MCP │ │ OpenClaw │
│ (wagl) │ │ (REST) │ │(stdio) │ │ plugin │
└───┬────┘ └───┬────┘ └───┬────┘ └────┬─────┘
│ │ │ │
└──────────┴──────┬───┴────────────┘
┌──────────────────┐
│ wagl-db │
│ (libSQL layer) │
└────────┬─────────┘
┌────────┴─────────┐
│ wagl-core │
│ (types, no IO) │
└──────────────────┘
┌────────┴─────────┐
│ libSQL │
│ (local + Turso │
│ embedded sync) │
└──────────────────┘
Comment on lines +27 to +36
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Incorrect dependency arrow: wagl-corelibSQL

The diagram draws wagl-core as a pass-through between wagl-db and libSQL, implying wagl-core depends on libSQL. But wagl-core is explicitly described as "types, no IO" with no database or network dependencies. Key Design Decision #6 also confirms the correct direction: core (types) → db (storage) → cli/server/mcp.

libSQL should branch off from wagl-db directly (alongside wagl-core), not below wagl-core. A corrected layout:

              ┌──────────────────┐
              │     wagl-db      │
              │  (libSQL layer)  │
              └────────┬─────────┘
               ┌───────┴───────┐
               │               │
    ┌──────────┴───┐    ┌───────┴──────┐
    │  wagl-core   │    │    libSQL    │
    │ (types, noIO)│    │ (local+Turso │
    └──────────────┘    │  embedded)   │
                        └──────────────┘

As written, this will mislead contributors into thinking wagl-core has a storage dependency.

```

## Crate Structure

### `crates/core` — Types (no IO)
Pure data types shared across all crates. No database, network, or filesystem dependencies.

- `MemoryItem` — the fundamental unit: text + tags + scores + metadata
- `ExpertiseItem`, `FocusItem` — curated domain slices
- `IntentItem` — terminology/alias mappings
- `MemoryEdge` — graph edges between items

### `crates/db` — Database Layer
All libSQL/SQLite interaction lives here. Owns the schema and migrations.

- `MemoryDb` — connection wrapper with sync support
- `migrate.rs` — schema versioning (currently v1, v2 in PR)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Draft note left in committed documentation

"v2 in PR" is a development-time annotation that should be resolved before merging to dev. Either the v2 schema is already present (update the note to say "currently v2") or it is not yet merged (remove the forward reference).

Suggested change
- `migrate.rs` — schema versioning (currently v1, v2 in PR)
- `migrate.rs` — schema versioning (currently v1)

- `vector_ext.rs` — sqlite-vec extension loading
- Insert/update/query/recall for all item types
- Turso embedded replica sync (`db.sync()`)

**Schema highlights:**
- `memory_items` — main table with vector embeddings column
- `expertise_items` + `expertise_index` — curated knowledge areas
- `focus_items` + `focus_index` — project/discussion-specific memory
- `memory_edges` — graph relationships between items
- `intent_items` — term/alias mappings

### `crates/cli` — The `wagl` Binary (~4600 lines)
The primary interface. 40+ subcommands organized by phase:

| Category | Commands |
|----------|----------|
| **Core CRUD** | `put`, `get`, `query`, `search`, `everything`, `forget` |
| **Canonical** | `canon get\|set\|list` |
| **Recall** | `recall` (deterministic packs), `scores` (explain scoring) |
| **Quality** | `reconcile`, `audit-quality`, `audit`, `import-missing` |
| **Lifecycle** | `sleep`, `morning`, `decay`, `gc` |
| **Knowledge** | `expertise`, `focus`, `promote` |
| **Graph** | `link`, `neighbors`, `reconstruct` |
| **Capture** | `capture`, `reflex` |
| **Data** | `bundle` (export/import), `curate`, `ingest`, `sync` |
| **Infra** | `init`, `status`, `stats`, `embed`, `trust`, `intent`, `serve` |

### `crates/server` — HTTP REST API
Axum-based server started via `wagl serve`. Exposes core operations over HTTP with JSON request/response.

- `PUT /items` — store memory
- `GET /items/:id` — retrieve by ID
- `POST /recall` — recall packs
- `POST /query` — text search
- `POST /search` — vector search
Comment on lines +84 to +88
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align the REST route list with the router in crates/server

The router exposed by wagl serve in crates/server/src/lib.rs:345-356 only registers GET/POST /items, GET/DELETE /items/{id}, GET/POST /intents, DELETE /intents/{id}, plus /health, /status, and /ws. Documenting PUT /items, /recall, /query, and /search here means anyone building against the new architecture page will hit 404/405 responses immediately because those endpoints do not exist in this commit.

Useful? React with 👍 / 👎.

- Webhook handlers for external event capture

### `crates/mcp` — MCP Server (stdio transport)
Model Context Protocol server for direct LLM tool integration.

- 5 tools: `store`, `recall`, `query`, `search`, `forget`
- stdio transport (launched as subprocess by MCP clients)
Comment on lines +94 to +95
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Replace the nonexistent MCP search tool in the tool list

The MCP server registered in crates/mcp/src/handler.rs:71-173 exposes memory_store, memory_recall, memory_query, memory_context, and memory_forget; there is no search tool. Publishing store/recall/query/search/forget here will cause MCP clients to ask for a capability the server never registers, while also hiding the actual context-pack tool they can use.

Useful? React with 👍 / 👎.

- Separate binary: `wagl-mcp`

## Data Flow: Recall

```
Agent asks: "What do I know about Chris's project?"
wagl recall "Chris's project"
├─── 1. Always include canon items
│ (canon:user.profile, canon:user.preferences)
├─── 2. Vector search (sqlite-vec cosine similarity)
│ Query → embedding → nearest neighbors
├─── 3. Keyword search (LIKE matching)
│ Extract significant terms → text match
├─── 4. Hybrid ranking
│ Vector hits ∩ keyword hits ranked first
│ Score: salience × recency × |d_score| boost
├─── 5. Multi-pass (if configured)
│ Pass 1: high-valence (|d_score| ≥ threshold)
│ Pass 2: standard relevance
│ Pass 3: recency boost
└─── 6. Return ranked pack with provenance
Each item: text + tags + scores + "why included"
```

## Scoring System

Three scores work together to surface the right memories:

- **D-Score** (feeling): -10 to +10. Emotional valence. Negative = bad experience, positive = great.
- **I-Score** (intuition): 0 to 2. Confidence/accuracy multiplier.
- **EV** (experience value): `d_score × i_score`. Ranking bonus in recall.

High |EV| items surface faster. Low/zero EV items rank normally by relevance.

## Embedding Strategy

- **Local CLI**: embeddings via configurable OpenAI-compatible endpoint
- **Cloud (zumie.ai)**: Gemini `gemini-embedding-001` (768 dimensions)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Internal service URL in public docs

AGENTS.md explicitly instructs: "Do not commit … internal hostnames, internal paths, or identifying operational details" and to use https://example.com/... instead of real internal service URLs.

zumie.ai appears twice — here and on line 155. If this is an internal/private service, both references should be genericised (e.g., your-cloud-backend). If it is intentionally public-facing, a brief comment in the PR description would clarify that.

The same applies to line 155:

Per-user isolated databases in zumie.ai (multi-tenant)

Context Used: AGENTS.md (source)

- **Storage**: `Float32Array` in sqlite-vec `float32[N]` column
- **Fallback**: graceful degradation when embeddings unavailable (keyword-only search)

## Sync Architecture

```
Local DB (libSQL) ←──embedded replica──→ Turso Cloud
└── wagl sync (push/pull)
```

- Local-first: works fully offline
- Turso embedded replicas for cloud sync
- Per-user isolated databases in zumie.ai (multi-tenant)

## Key Design Decisions

1. **Local-first, not cloud-first** — CLI works without network. Cloud is additive.
2. **libSQL over SQLite** — Turso sync, vector support via sqlite-vec, same SQLite compatibility.
3. **Tags over schema fields** — Canonical conventions (`canon:user.profile`) use tags to avoid schema rigidity. Schema fields added only when behavior depends on them (scores, emotions, actionable).
4. **Deterministic recall** — Same query + same DB = same results. No randomness in ranking.
5. **Doctrine in the binary** — `wagl skill` embeds the behavioral contract. Agents learn from the tool.
6. **Multi-crate for isolation** — core (types) → db (storage) → cli/server/mcp (surfaces). No circular deps.

---

*Generated 2026-03-21. See `AGENTS.md` for contributor orientation.*
Loading