BigInformatics · GoZumie · Mar 22, 2026 · Mar 22, 2026 · greptile-apps · Mar 22, 2026
@@ -0,0 +1,168 @@
+# ARCHITECTURE.md
-# ARCHITECTURE.md
+# Architecture
-# ARCHITECTURE.md
+# Architecture
+
+> wagl — local-first agent memory backed by libSQL
+
+## System Overview
+
+```
+┌─────────────────────────────────────────────────────┐
+│                   Agent / Human                      │
+│         (OpenClaw, ChatGPT, Claude, CLI)             │
+└──────┬──────────┬──────────┬───────────┬────────────┘
+       │          │          │           │
+       ▼          ▼          ▼           ▼
+  ┌────────┐ ┌────────┐ ┌────────┐ ┌──────────┐
+  │  CLI   │ │ Server │ │  MCP   │ │ OpenClaw │
+  │ (wagl) │ │ (REST) │ │(stdio) │ │ plugin   │
+  └───┬────┘ └───┬────┘ └───┬────┘ └────┬─────┘
+      │          │          │            │
+      └──────────┴──────┬───┴────────────┘
+                        │
+                        ▼
+              ┌──────────────────┐
+              │     wagl-db      │
+              │  (libSQL layer)  │
+              └────────┬─────────┘
+                       │
+              ┌────────┴─────────┐
+              │   wagl-core      │
+              │ (types, no IO)   │
+              └──────────────────┘
+                       │
+              ┌────────┴─────────┐
+              │     libSQL       │
+              │  (local + Turso  │
+              │   embedded sync) │
+              └──────────────────┘
+```
+
+## Crate Structure
+
+### `crates/core` — Types (no IO)
+Pure data types shared across all crates. No database, network, or filesystem dependencies.
+
+- `MemoryItem` — the fundamental unit: text + tags + scores + metadata
+- `ExpertiseItem`, `FocusItem` — curated domain slices
+- `IntentItem` — terminology/alias mappings
+- `MemoryEdge` — graph edges between items
+
+### `crates/db` — Database Layer
+All libSQL/SQLite interaction lives here. Owns the schema and migrations.
+
+- `MemoryDb` — connection wrapper with sync support
+- `migrate.rs` — schema versioning (currently v1, v2 in PR)
- `migrate.rs` — schema versioning (currently v1, v2 in PR)
+- `migrate.rs` — schema versioning (currently v1)
- `migrate.rs` — schema versioning (currently v1, v2 in PR)
+- `migrate.rs` — schema versioning (currently v1)
+- `vector_ext.rs` — sqlite-vec extension loading
+- Insert/update/query/recall for all item types
+- Turso embedded replica sync (`db.sync()`)
+
+**Schema highlights:**
+- `memory_items` — main table with vector embeddings column
+- `expertise_items` + `expertise_index` — curated knowledge areas
+- `focus_items` + `focus_index` — project/discussion-specific memory
+- `memory_edges` — graph relationships between items
+- `intent_items` — term/alias mappings
+
+### `crates/cli` — The `wagl` Binary (~4600 lines)
+The primary interface. 40+ subcommands organized by phase:
+
+| Category | Commands |
+|----------|----------|
+| **Core CRUD** | `put`, `get`, `query`, `search`, `everything`, `forget` |
+| **Canonical** | `canon get\|set\|list` |
+| **Recall** | `recall` (deterministic packs), `scores` (explain scoring) |
+| **Quality** | `reconcile`, `audit-quality`, `audit`, `import-missing` |
+| **Lifecycle** | `sleep`, `morning`, `decay`, `gc` |
+| **Knowledge** | `expertise`, `focus`, `promote` |
+| **Graph** | `link`, `neighbors`, `reconstruct` |
+| **Capture** | `capture`, `reflex` |
+| **Data** | `bundle` (export/import), `curate`, `ingest`, `sync` |
+| **Infra** | `init`, `status`, `stats`, `embed`, `trust`, `intent`, `serve` |
+
+### `crates/server` — HTTP REST API
+Axum-based server started via `wagl serve`. Exposes core operations over HTTP with JSON request/response.
+
+- `PUT /items` — store memory
+- `GET /items/:id` — retrieve by ID
+- `POST /recall` — recall packs
+- `POST /query` — text search
+- `POST /search` — vector search
+- Webhook handlers for external event capture
+
+### `crates/mcp` — MCP Server (stdio transport)
+Model Context Protocol server for direct LLM tool integration.
+
+- 5 tools: `store`, `recall`, `query`, `search`, `forget`
+- stdio transport (launched as subprocess by MCP clients)
+- Separate binary: `wagl-mcp`
+
+## Data Flow: Recall
+
+```
+Agent asks: "What do I know about Chris's project?"
+       │
+       ▼
+  wagl recall "Chris's project"
+       │
+       ├─── 1. Always include canon items
+       │         (canon:user.profile, canon:user.preferences)
+       │
+       ├─── 2. Vector search (sqlite-vec cosine similarity)
+       │         Query → embedding → nearest neighbors
+       │
+       ├─── 3. Keyword search (LIKE matching)
+       │         Extract significant terms → text match
+       │
+       ├─── 4. Hybrid ranking
+       │         Vector hits ∩ keyword hits ranked first
+       │         Score: salience × recency × |d_score| boost
+       │
+       ├─── 5. Multi-pass (if configured)
+       │         Pass 1: high-valence (|d_score| ≥ threshold)
+       │         Pass 2: standard relevance
+       │         Pass 3: recency boost
+       │
+       └─── 6. Return ranked pack with provenance
+                Each item: text + tags + scores + "why included"
+```
+
+## Scoring System
+
+Three scores work together to surface the right memories:
+
+- **D-Score** (feeling): -10 to +10. Emotional valence. Negative = bad experience, positive = great.
+- **I-Score** (intuition): 0 to 2. Confidence/accuracy multiplier.
+- **EV** (experience value): `d_score × i_score`. Ranking bonus in recall.
+
+High |EV| items surface faster. Low/zero EV items rank normally by relevance.
+
+## Embedding Strategy
+
+- **Local CLI**: embeddings via configurable OpenAI-compatible endpoint
+- **Cloud (zumie.ai)**: Gemini `gemini-embedding-001` (768 dimensions)
+- **Storage**: `Float32Array` in sqlite-vec `float32[N]` column
+- **Fallback**: graceful degradation when embeddings unavailable (keyword-only search)
+
+## Sync Architecture
+
+```
+Local DB (libSQL) ←──embedded replica──→ Turso Cloud
+       │
+       └── wagl sync (push/pull)
+```
+
+- Local-first: works fully offline
+- Turso embedded replicas for cloud sync
+- Per-user isolated databases in zumie.ai (multi-tenant)
+
+## Key Design Decisions
+
+1. **Local-first, not cloud-first** — CLI works without network. Cloud is additive.
+2. **libSQL over SQLite** — Turso sync, vector support via sqlite-vec, same SQLite compatibility.
+3. **Tags over schema fields** — Canonical conventions (`canon:user.profile`) use tags to avoid schema rigidity. Schema fields added only when behavior depends on them (scores, emotions, actionable).
+4. **Deterministic recall** — Same query + same DB = same results. No randomness in ranking.
+5. **Doctrine in the binary** — `wagl skill` embeds the behavioral contract. Agents learn from the tool.
+6. **Multi-crate for isolation** — core (types) → db (storage) → cli/server/mcp (surfaces). No circular deps.
+
+---
+
+*Generated 2026-03-21. See `AGENTS.md` for contributor orientation.*