feat: add environment + origin_domain provenance fields (#119)#127
feat: add environment + origin_domain provenance fields (#119)#127
Conversation
…es (#119, #120) Schema migration v1→v2: - memory_items: new environment TEXT and origin_domain TEXT columns - New tables: daily_entries (append-only), daily_memories (junction), daily (materialized summary) Core types: - MemoryItem: add environment, origin_domain fields - New DailyEntry, DailySummary structs DB layer: - Centralized memory_item_from_row() helper (eliminates ~200 lines of duplicated row-parsing code across get/query/recall/everything) - MEMORY_ITEM_SELECT constant for consistent column projection - environment/origin_domain threaded through all INSERT and SELECT paths - TranscriptOpts, NoteOpts, DecisionOpts: add provenance fields - New daily methods: insert_daily_entry, get_daily_entries, list_daily_dates, invalidate_daily_entry, link_daily_memory, refresh_daily_summary CLI: - Cmd::Put: --environment and --origin flags (defaults: cli, hostname) - Cmd::Day: full subcommand with --update, --date, --context, --list, --include-invalidated, --invalidate, --reason Server + MCP: - PutItemRequest: accept environment/origin_domain - MCP handler: pass through provenance fields Tests: - Migration tests updated for v2 schema - New migration_v1_to_v2_adds_daily_tables test
crates/mcp/src/handler.rs
Outdated
| environment: None, | ||
| origin_domain: None, |
There was a problem hiding this comment.
MCP items silently lose all provenance
environment and origin_domain are both hardcoded to None in the MCP handler, and MemoryStoreParams has no corresponding fields, so there's no way for an MCP caller to supply them either. The PR's stated motivation is:
"When multiple agents/surfaces write memories, provenance is critical for debugging, filtering, and trust."
The MCP surface is precisely the primary path for agent-driven memory writes, yet every item stored through it will have NULL for both provenance fields. At a minimum, you should tag items with the surface they arrived on so the goal is actually achieved:
| environment: None, | |
| origin_domain: None, | |
| environment: Some("mcp".to_string()), | |
| origin_domain: None, |
Separately, consider adding environment and origin_domain fields to MemoryStoreParams so MCP callers can supply their own provenance, the same way the HTTP server does in PutItemRequest.
| id, | ||
| r#type: "people".to_string(), |
There was a problem hiding this comment.
Inconsistent
environment across CLI-originated items
These people and trust-adjustment items are created directly by the CLI, but environment is None while the Put and CanonPut commands always default it to "cli". The same gap exists in several other places added by this PR:
crates/cli/src/main.rs– People command (environment: None, ~line 3596) and trust-adjustment observation (~line 3650)crates/cli/src/main.rs– Decide/ReDecide paths (environment: None, ~lines 4117 and 4234)crates/cli/src/main.rs–import_missing_fileshelper (environment: None, ~line 4744)crates/cli/src/ingest_cmd.rs– Transcript ingestion setsenvironment: Some("cli")butorigin_domain: Noneinstead ofdefault_origin_domain()
If provenance is the goal, all CLI-originated writes should consistently carry environment: Some("cli".to_string()) and call default_origin_domain() for origin_domain, or at least document why some paths intentionally omit it.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 800ef1db3b
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
crates/mcp/src/handler.rs
Outdated
| environment: None, | ||
| origin_domain: None, |
There was a problem hiding this comment.
Expose provenance fields in MCP memory_store
memory_store still hard-codes environment and origin_domain to None, and MemoryStoreParams has no matching inputs, so every memory written through the MCP surface loses the provenance this change is trying to add. In deployments that ingest memories via MCP, downstream filtering/debugging by source will never work because those rows are indistinguishable from legacy items.
Useful? React with 👍 / 👎.
|
Self-review notes: ✅ |
…omain Addresses Greptile P1: MCP items were silently losing all provenance (environment and origin_domain hardcoded to None).
|
Pushed fix for P1 finding:
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ae7e4264de
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| environment: None, | ||
| origin_domain: None, |
There was a problem hiding this comment.
Tag capture/reflex writes with the new provenance fields
The wagl capture branch still constructs MemoryItem with both provenance fields set to None, and the Reflex wrapper repeats the same pattern a few lines below. In any setup that relies on policy-governed capture for decisions/incidents/deploys, every captured row remains indistinguishable from legacy items, so filtering or debugging by source will still fail on a first-class CLI write path.
Useful? React with 👍 / 👎.
| environment: Some("cli".to_string()), | ||
| origin_domain: None, |
There was a problem hiding this comment.
Preserve host provenance when ingesting transcripts
run_ingest_transcripts now marks imported turns as environment="cli", but it still hard-codes origin_domain to None. When transcript archives are imported from multiple hosts or agents, the new origin_domain column never distinguishes those memories, so the bulk-ingest path still loses half of the provenance this change is trying to add.
Useful? React with 👍 / 👎.
What
Adds
environment(TEXT) andorigin_domain(TEXT) columns to memory_items, threaded through all layers.Schema
--environment(default: 'cli') and--origin(default: hostname)memory_item_from_row()helper — eliminates ~200 lines of duplicated row-parsingWhy
Every memory item should know where it came from. When multiple agents/surfaces write memories, provenance is critical for debugging, filtering, and trust.
Closes #119
PR Review by Greptile
Greptile Summary
This PR threads
environmentandorigin_domainprovenance fields through the full memory-item stack (schema, types, DB layer, CLI, server) and centralises row-parsing into a singlememory_item_from_row()helper, eliminating ~200 lines of duplicated code. It also ships a newDaysubcommand and the threedaily_*tables as a schema v2 migration.lib.rs,migrate.rs): column additions viaapply_pending_altersare correct;MEMORY_ITEM_SELECTconstant andmemory_item_from_row()cleanly unify all query paths; migration v1→v2 runs in a transaction and is well-tested.server/src/lib.rs):PutItemRequestcorrectly exposes and passes through both provenance fields.main.rs,ingest_cmd.rs):PutandCanonPutcommands defaultenvironmentto"cli"andorigin_domainto the system hostname. However, several other CLI-originated item constructors (People, Trust, Decide/ReDecide,import_missing_files) still setenvironment: None, which is inconsistent with the stated goal. The newDaysubcommand'svalidate_day_argsfunction also does not guard against--datebeing silently ignored when combined with--invalidate.mcp/src/handler.rs): both fields are hardcoded toNoneandMemoryStoreParamshas no corresponding inputs, meaning every agent-written memory will have no provenance — directly at odds with the motivation for this PR.Confidence Score: 3/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant CLI as CLI (Put/Canon) participant Ingest as CLI (Ingest) participant MCP as MCP Handler participant Server as HTTP Server participant DB as MemoryDb CLI->>DB: put(MemoryItem)<br/>environment="cli", origin_domain=hostname Ingest->>DB: put(MemoryItem)<br/>environment="cli", origin_domain=None MCP->>DB: put(MemoryItem)<br/>environment=None, origin_domain=None Server->>DB: put(MemoryItem)<br/>environment=req.environment, origin_domain=req.origin_domain note over DB: memory_items<br/>+ environment TEXT<br/>+ origin_domain TEXT DB-->>CLI: MemoryItem (with provenance) DB-->>Ingest: MemoryItem (partial provenance) DB-->>MCP: MemoryItem (no provenance) DB-->>Server: MemoryItem (caller-supplied provenance)Comments Outside Diff (1)
crates/cli/src/main.rs, line 298-310 (link)--datesilently ignored when combined with--invalidatevalidate_day_argsrejects--datepaired with--updateand with--list, but there is no corresponding guard for--date --invalidate. Since the invalidation path looks up the entry by ID only and never readsdate, passing--datewith--invalidatewill succeed without error while the date is silently dropped, which is likely to confuse users.Consider adding the mirror check:
Last reviewed commit: "feat: add environmen..."