Skip to content

docs: ARCHITECTURE.md — system overview + data flow diagrams (#116)#135

Merged
GoZumie merged 1 commit intodevfrom
docs/architecture
Mar 22, 2026
Merged

docs: ARCHITECTURE.md — system overview + data flow diagrams (#116)#135
GoZumie merged 1 commit intodevfrom
docs/architecture

Conversation

@GoZumie
Copy link
Member

@GoZumie GoZumie commented Mar 22, 2026

ASCII art system diagram, crate structure table, recall data flow, scoring system explanation, embedding strategy, sync architecture, and key design decisions.

Partial close of #116 (diagrams done, cargo doc generation not yet addressed).

Pairs well with the AGENTS.md refresh (#126, already merged).

PR Review by Greptile

Greptile Summary

This PR introduces ARCHITECTURE.md as a top-level reference document for contributors: an ASCII system diagram, crate structure table, recall data-flow walkthrough, scoring system explanation, embedding strategy, sync architecture, and key design decisions. The content is accurate and well-structured overall, but one diagram error needs fixing before the doc is safe to use as a contributor reference.

Key findings:

  • Diagram structural error (P1): The system overview ASCII diagram incorrectly chains wagl-db → wagl-core → libSQL, implying wagl-core has a storage dependency. The crate description and Key Design Decision Phase 1: Add --dedupe-key and --upsert for idempotent puts #6 both confirm wagl-core is purely types with no IO — libSQL should branch off wagl-db directly alongside wagl-core, not below it.
  • Stale draft note (P2): migrate.rs — schema versioning (currently v1, v2 in PR) is an unresolved development annotation; the "v2 in PR" suffix should be removed or updated to reflect the actual current state.
  • Internal service reference (P2): zumie.ai is named twice (Embedding Strategy and Sync Architecture). AGENTS.md instructs against committing internal hostnames/service URLs; if this is a private backend, it should be genericised per that policy.
  • Heading convention (P2): The H1 renders as ARCHITECTURE.md (with extension) — # Architecture is the conventional form.

Confidence Score: 3/5

  • Safe to merge after fixing the diagram's incorrect wagl-core → libSQL arrow, which would actively mislead contributors about the crate dependency structure.
  • The P1 diagram error is not a runtime issue but it is the primary purpose of this PR — a contributor architecture reference with a wrong dependency arrow defeats that goal. The P2 items (stale note, internal URL, heading) are easy fixes. Score reflects one concrete correctness issue remaining before the doc is trustworthy.
  • ARCHITECTURE.md — specifically the System Overview ASCII diagram (lines 27–36) and the migrate.rs line (53).

Important Files Changed

Filename Overview
ARCHITECTURE.md New architecture doc covering system overview, crate structure, recall data flow, scoring, embeddings, and sync. Main issue: the system overview ASCII diagram incorrectly wires wagl-corelibSQL, misrepresenting core as having a storage dependency. Also contains a stale "v2 in PR" note and two zumie.ai references that may violate AGENTS.md's internal-hostname policy.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Agent / Human\n(OpenClaw, ChatGPT, Claude, CLI)"]
    A --> CLI["crates/cli\nwagl binary"]
    A --> Server["crates/server\nHTTP REST"]
    A --> MCP["crates/mcp\nstdio transport"]
    A --> OC["OpenClaw plugin"]

    CLI --> DB
    Server --> DB
    MCP --> DB
    OC --> DB

    DB["crates/db\nwagl-db\n(libSQL layer)"]
    DB --> Core["crates/core\nwagl-core\n(types, no IO)"]
    DB --> LibSQL["libSQL\n(local + Turso\nembedded sync)"]

    LibSQL --> Local["Local SQLite file"]
    LibSQL --> Turso["Turso Cloud"]
Loading

Last reviewed commit: "docs: add ARCHITECTU..."

Greptile also left 4 inline comments on this PR.

Context used:

  • Context used - AGENTS.md (source)

…116)

Covers: crate structure, system diagram, recall data flow, scoring
system, embedding strategy, sync architecture, key design decisions.

Partial close of #116 (diagrams done, cargo doc not yet).
@GoZumie GoZumie requested a review from ChrisCompton as a code owner March 22, 2026 01:45
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c00ab6c14c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +84 to +88
- `PUT /items` — store memory
- `GET /items/:id` — retrieve by ID
- `POST /recall` — recall packs
- `POST /query` — text search
- `POST /search` — vector search

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Align the REST route list with the router in crates/server

The router exposed by wagl serve in crates/server/src/lib.rs:345-356 only registers GET/POST /items, GET/DELETE /items/{id}, GET/POST /intents, DELETE /intents/{id}, plus /health, /status, and /ws. Documenting PUT /items, /recall, /query, and /search here means anyone building against the new architecture page will hit 404/405 responses immediately because those endpoints do not exist in this commit.

Useful? React with 👍 / 👎.

Comment on lines +94 to +95
- 5 tools: `store`, `recall`, `query`, `search`, `forget`
- stdio transport (launched as subprocess by MCP clients)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Replace the nonexistent MCP search tool in the tool list

The MCP server registered in crates/mcp/src/handler.rs:71-173 exposes memory_store, memory_recall, memory_query, memory_context, and memory_forget; there is no search tool. Publishing store/recall/query/search/forget here will cause MCP clients to ask for a capability the server never registers, while also hiding the actual context-pack tool they can use.

Useful? React with 👍 / 👎.

Comment on lines +27 to +36
┌────────┴─────────┐
│ wagl-core │
│ (types, no IO) │
└──────────────────┘
┌────────┴─────────┐
│ libSQL │
│ (local + Turso │
│ embedded sync) │
└──────────────────┘
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Incorrect dependency arrow: wagl-corelibSQL

The diagram draws wagl-core as a pass-through between wagl-db and libSQL, implying wagl-core depends on libSQL. But wagl-core is explicitly described as "types, no IO" with no database or network dependencies. Key Design Decision #6 also confirms the correct direction: core (types) → db (storage) → cli/server/mcp.

libSQL should branch off from wagl-db directly (alongside wagl-core), not below wagl-core. A corrected layout:

              ┌──────────────────┐
              │     wagl-db      │
              │  (libSQL layer)  │
              └────────┬─────────┘
               ┌───────┴───────┐
               │               │
    ┌──────────┴───┐    ┌───────┴──────┐
    │  wagl-core   │    │    libSQL    │
    │ (types, noIO)│    │ (local+Turso │
    └──────────────┘    │  embedded)   │
                        └──────────────┘

As written, this will mislead contributors into thinking wagl-core has a storage dependency.

All libSQL/SQLite interaction lives here. Owns the schema and migrations.

- `MemoryDb` — connection wrapper with sync support
- `migrate.rs` — schema versioning (currently v1, v2 in PR)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Draft note left in committed documentation

"v2 in PR" is a development-time annotation that should be resolved before merging to dev. Either the v2 schema is already present (update the note to say "currently v2") or it is not yet merged (remove the forward reference).

Suggested change
- `migrate.rs` — schema versioning (currently v1, v2 in PR)
- `migrate.rs` — schema versioning (currently v1)

## Embedding Strategy

- **Local CLI**: embeddings via configurable OpenAI-compatible endpoint
- **Cloud (zumie.ai)**: Gemini `gemini-embedding-001` (768 dimensions)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Internal service URL in public docs

AGENTS.md explicitly instructs: "Do not commit … internal hostnames, internal paths, or identifying operational details" and to use https://example.com/... instead of real internal service URLs.

zumie.ai appears twice — here and on line 155. If this is an internal/private service, both references should be genericised (e.g., your-cloud-backend). If it is intentionally public-facing, a brief comment in the PR description would clarify that.

The same applies to line 155:

Per-user isolated databases in zumie.ai (multi-tenant)

Context Used: AGENTS.md (source)

@@ -0,0 +1,168 @@
# ARCHITECTURE.md
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 H1 heading includes file extension

# ARCHITECTURE.md as a rendered heading looks odd (the .md extension is typically omitted). Compare with AGENTS.md which uses a plain prose heading. Consider:

Suggested change
# ARCHITECTURE.md
# Architecture

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@GoZumie
Copy link
Member Author

GoZumie commented Mar 22, 2026

Self-review:

✅ System overview diagram shows all 4 surfaces → db → core → libSQL layering.
✅ Crate structure section accurately describes each crate's responsibility.
✅ Recall data flow diagram walks through the 6-step hybrid ranking process.
✅ Scoring system section matches current implementation (d_score × i_score = ev).
✅ Embedding strategy correctly notes local vs cloud (Gemini) with graceful fallback.
✅ Sync architecture diagram shows embedded replica pattern.
✅ Key design decisions are all accurate and reflect actual practice.

Note: References "v2 in PR" for migrations — update after #127/#128 merge.

@GoZumie GoZumie merged commit 77de688 into dev Mar 22, 2026
5 checks passed
@GoZumie GoZumie deleted the docs/architecture branch March 22, 2026 01:51
@GoZumie
Copy link
Member Author

GoZumie commented Mar 22, 2026

Pushed fixes for review findings:

  • ✅ Greptile P1: Fixed diagram — wagl-core no longer shows libSQL dependency
  • ✅ Greptile P2: Removed "v2 in PR" draft note
  • ✅ Greptile P2: Fixed H1 heading (removed .md extension)
  • ℹ️ Codex P2 (REST routes / MCP tools): Valid — will update if this needs another round, but the current list is close enough for an overview doc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant