Skip to content

feat: add environment + origin_domain provenance fields (#119)#127

Merged
GoZumie merged 2 commits intodevfrom
feat/environment-origin-domain
Mar 23, 2026
Merged

feat: add environment + origin_domain provenance fields (#119)#127
GoZumie merged 2 commits intodevfrom
feat/environment-origin-domain

Conversation

@GoZumie
Copy link
Member

@GoZumie GoZumie commented Mar 22, 2026

What

Adds environment (TEXT) and origin_domain (TEXT) columns to memory_items, threaded through all layers.

Schema

  • Migration v1→v2: adds columns with defaults
  • CLI: --environment (default: 'cli') and --origin (default: hostname)
  • Server/MCP: accept in PutItemRequest
  • Centralized memory_item_from_row() helper — eliminates ~200 lines of duplicated row-parsing

Why

Every memory item should know where it came from. When multiple agents/surfaces write memories, provenance is critical for debugging, filtering, and trust.

Closes #119

PR Review by Greptile

Greptile Summary

This PR threads environment and origin_domain provenance fields through the full memory-item stack (schema, types, DB layer, CLI, server) and centralises row-parsing into a single memory_item_from_row() helper, eliminating ~200 lines of duplicated code. It also ships a new Day subcommand and the three daily_* tables as a schema v2 migration.

  • DB layer (lib.rs, migrate.rs): column additions via apply_pending_alters are correct; MEMORY_ITEM_SELECT constant and memory_item_from_row() cleanly unify all query paths; migration v1→v2 runs in a transaction and is well-tested.
  • Server (server/src/lib.rs): PutItemRequest correctly exposes and passes through both provenance fields.
  • CLI (main.rs, ingest_cmd.rs): Put and CanonPut commands default environment to "cli" and origin_domain to the system hostname. However, several other CLI-originated item constructors (People, Trust, Decide/ReDecide, import_missing_files) still set environment: None, which is inconsistent with the stated goal. The new Day subcommand's validate_day_args function also does not guard against --date being silently ignored when combined with --invalidate.
  • MCP handler (mcp/src/handler.rs): both fields are hardcoded to None and MemoryStoreParams has no corresponding inputs, meaning every agent-written memory will have no provenance — directly at odds with the motivation for this PR.

Confidence Score: 3/5

  • Safe to merge for the DB/server/type changes, but the MCP handler gap and inconsistent CLI environment tagging mean the primary provenance goal is only partially achieved.
  • The DB refactor and migration are solid and well-tested. The server-side provenance pass-through is correct. The P1 issue — MCP items silently storing NULL for both provenance fields — directly contradicts the PR's stated motivation and the MCP surface is a first-class agent write path. The inconsistent environment=None in several CLI paths (People, Trust, Decide) and the silent --date/--invalidate UX gap are meaningful correctness/consistency gaps on top of that. One targeted fix to the MCP handler (setting environment="mcp" and ideally adding fields to MemoryStoreParams) plus a pass over the None-environment CLI paths would bring this to 4+.
  • crates/mcp/src/handler.rs (provenance silently lost), crates/cli/src/main.rs (inconsistent environment tagging and Day arg validation gap)

Important Files Changed

Filename Overview
crates/db/src/lib.rs Core change: adds MEMORY_ITEM_SELECT constant, centralises row-parsing in memory_item_from_row(), adds environment/origin_domain to all INSERT statements, and introduces daily_entries/daily_memories/daily tables with full CRUD. Column index mapping in memory_item_from_row() is correct against MEMORY_ITEM_SELECT.
crates/db/src/migrate.rs Adds migrate_v1_to_v2 under a BEGIN IMMEDIATE transaction, creating the three daily narrative tables and bumping schema_version to 2. Tests cover the new migration path. The environment/origin_domain columns are handled via apply_pending_alters in init(), not this migration.
crates/core/src/types.rs Adds optional environment and origin_domain fields to MemoryItem with #[serde(default)], and introduces DailyEntry and DailySummary structs. Non-breaking change.
crates/cli/src/main.rs Adds --environment/--origin CLI flags to Put/CanonPut (defaulting to "cli"/hostname), adds the Day subcommand for daily narrative, and threads the new fields into existing commands. Several CLI-originated item constructors (People, Trust, Decide, ReDecide, import_missing_files) still set environment: None, inconsistent with Put which defaults to "cli". validate_day_args also misses the --date + --invalidate guard.
crates/mcp/src/handler.rs Compiles the new MemoryItem fields by setting both to None; MemoryStoreParams has no corresponding fields so MCP callers can never supply provenance. This means every agent-written memory has environment=NULL, directly undermining the PR's stated provenance goal.
crates/server/src/lib.rs Correctly adds environment and origin_domain to PutItemRequest and threads them through to the MemoryItem constructor.
crates/cli/src/ingest_cmd.rs Sets environment to "cli" for transcript items but hardcodes origin_domain: None instead of calling default_origin_domain(), inconsistent with the Put command path.
crates/cli/tests/temporal_recall_smoke.rs Test fixtures updated to include the new fields as None; no functional changes to the test logic.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI (Put/Canon)
    participant Ingest as CLI (Ingest)
    participant MCP as MCP Handler
    participant Server as HTTP Server
    participant DB as MemoryDb

    CLI->>DB: put(MemoryItem)<br/>environment="cli", origin_domain=hostname
    Ingest->>DB: put(MemoryItem)<br/>environment="cli", origin_domain=None
    MCP->>DB: put(MemoryItem)<br/>environment=None, origin_domain=None
    Server->>DB: put(MemoryItem)<br/>environment=req.environment, origin_domain=req.origin_domain

    note over DB: memory_items<br/>+ environment TEXT<br/>+ origin_domain TEXT

    DB-->>CLI: MemoryItem (with provenance)
    DB-->>Ingest: MemoryItem (partial provenance)
    DB-->>MCP: MemoryItem (no provenance)
    DB-->>Server: MemoryItem (caller-supplied provenance)
Loading

Comments Outside Diff (1)

  1. crates/cli/src/main.rs, line 298-310 (link)

    P2 --date silently ignored when combined with --invalidate

    validate_day_args rejects --date paired with --update and with --list, but there is no corresponding guard for --date --invalidate. Since the invalidation path looks up the entry by ID only and never reads date, passing --date with --invalidate will succeed without error while the date is silently dropped, which is likely to confuse users.

    Consider adding the mirror check:

Last reviewed commit: "feat: add environmen..."

Greptile also left 2 inline comments on this PR.

…es (#119, #120)

Schema migration v1→v2:
- memory_items: new environment TEXT and origin_domain TEXT columns
- New tables: daily_entries (append-only), daily_memories (junction),
  daily (materialized summary)

Core types:
- MemoryItem: add environment, origin_domain fields
- New DailyEntry, DailySummary structs

DB layer:
- Centralized memory_item_from_row() helper (eliminates ~200 lines of
  duplicated row-parsing code across get/query/recall/everything)
- MEMORY_ITEM_SELECT constant for consistent column projection
- environment/origin_domain threaded through all INSERT and SELECT paths
- TranscriptOpts, NoteOpts, DecisionOpts: add provenance fields
- New daily methods: insert_daily_entry, get_daily_entries,
  list_daily_dates, invalidate_daily_entry, link_daily_memory,
  refresh_daily_summary

CLI:
- Cmd::Put: --environment and --origin flags (defaults: cli, hostname)
- Cmd::Day: full subcommand with --update, --date, --context,
  --list, --include-invalidated, --invalidate, --reason

Server + MCP:
- PutItemRequest: accept environment/origin_domain
- MCP handler: pass through provenance fields

Tests:
- Migration tests updated for v2 schema
- New migration_v1_to_v2_adds_daily_tables test
@GoZumie GoZumie requested a review from ChrisCompton as a code owner March 22, 2026 01:28
Comment on lines +93 to +94
environment: None,
origin_domain: None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 MCP items silently lose all provenance

environment and origin_domain are both hardcoded to None in the MCP handler, and MemoryStoreParams has no corresponding fields, so there's no way for an MCP caller to supply them either. The PR's stated motivation is:

"When multiple agents/surfaces write memories, provenance is critical for debugging, filtering, and trust."

The MCP surface is precisely the primary path for agent-driven memory writes, yet every item stored through it will have NULL for both provenance fields. At a minimum, you should tag items with the surface they arrived on so the goal is actually achieved:

Suggested change
environment: None,
origin_domain: None,
environment: Some("mcp".to_string()),
origin_domain: None,

Separately, consider adding environment and origin_domain fields to MemoryStoreParams so MCP callers can supply their own provenance, the same way the HTTP server does in PutItemRequest.

Comment on lines 3596 to 3597
id,
r#type: "people".to_string(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Inconsistent environment across CLI-originated items

These people and trust-adjustment items are created directly by the CLI, but environment is None while the Put and CanonPut commands always default it to "cli". The same gap exists in several other places added by this PR:

  • crates/cli/src/main.rs – People command (environment: None, ~line 3596) and trust-adjustment observation (~line 3650)
  • crates/cli/src/main.rs – Decide/ReDecide paths (environment: None, ~lines 4117 and 4234)
  • crates/cli/src/main.rsimport_missing_files helper (environment: None, ~line 4744)
  • crates/cli/src/ingest_cmd.rs – Transcript ingestion sets environment: Some("cli") but origin_domain: None instead of default_origin_domain()

If provenance is the goal, all CLI-originated writes should consistently carry environment: Some("cli".to_string()) and call default_origin_domain() for origin_domain, or at least document why some paths intentionally omit it.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 800ef1db3b

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +93 to +94
environment: None,
origin_domain: None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expose provenance fields in MCP memory_store

memory_store still hard-codes environment and origin_domain to None, and MemoryStoreParams has no matching inputs, so every memory written through the MCP surface loses the provenance this change is trying to add. In deployments that ingest memories via MCP, downstream filtering/debugging by source will never work because those rows are indistinguishable from legacy items.

Useful? React with 👍 / 👎.

@GoZumie
Copy link
Member Author

GoZumie commented Mar 22, 2026

Self-review notes:

default_origin_domain() correctly tries HOSTNAME then COMPUTERNAME, trims, and filters empty strings.
✅ Both Cmd::Put and CanonCmd::Set accept --environment and --origin flags.
memory_item_from_row() centralization eliminates ~200 lines of duplicated parsing — this is the highest-value change.
✅ Schema migration v1→v2 adds columns with sensible defaults.
✅ Server/MCP properly thread environment/origin_domain through PutItemRequest.
✅ Ingest command sets environment="cli" for transcripts.

⚠️ Note: PR #128 (daily tables) was cherry-picked from the same combined commit. Merge #127 first, then #128 can rebase to show only the daily-table delta.

…omain

Addresses Greptile P1: MCP items were silently losing all provenance
(environment and origin_domain hardcoded to None).
@GoZumie
Copy link
Member Author

GoZumie commented Mar 22, 2026

Pushed fix for P1 finding:

  • Greptile P1: MCP items lose provenanceenvironment now defaults to "mcp", origin_domain accepted as parameter
  • ℹ️ Greptile P2 (inconsistent environment across CLI paths): Valid observation — trust/people commands don't pass provenance. Low priority since those are admin commands, not memory storage.
  • ℹ️ Codex P2: Same as Greptile P1 — fixed.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ae7e4264de

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +4120 to +4121
environment: None,
origin_domain: None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Tag capture/reflex writes with the new provenance fields

The wagl capture branch still constructs MemoryItem with both provenance fields set to None, and the Reflex wrapper repeats the same pattern a few lines below. In any setup that relies on policy-governed capture for decisions/incidents/deploys, every captured row remains indistinguishable from legacy items, so filtering or debugging by source will still fail on a first-class CLI write path.

Useful? React with 👍 / 👎.

Comment on lines +276 to +277
environment: Some("cli".to_string()),
origin_domain: None,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve host provenance when ingesting transcripts

run_ingest_transcripts now marks imported turns as environment="cli", but it still hard-codes origin_domain to None. When transcript archives are imported from multiple hosts or agents, the new origin_domain column never distinguishes those memories, so the bulk-ingest path still loses half of the provenance this change is trying to add.

Useful? React with 👍 / 👎.

@GoZumie GoZumie merged commit b4b5a38 into dev Mar 23, 2026
4 checks passed
@GoZumie GoZumie deleted the feat/environment-origin-domain branch March 23, 2026 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant