This document defines the main runtime flow of the ANDB v1 prototype. Its goal is to keep all contributors attached to one shared end-to-end path instead of building isolated pieces that cannot integrate.
The flow described here is both:
- the architectural target for v1
- the integration contract that current code should evolve toward
The core ANDB loop is:
event input -> event ingest -> canonical object materialization -> retrieval projection -> query planning -> multi-path retrieval -> graph expansion -> evidence assembly -> proof trace -> structured response
This loop is the most important contract in the repository.
If the flow is not defined early, the repository will drift in predictable ways:
- event payloads will stop matching object materialization needs
- retrieval will optimize for chunks rather than objects
- graph expansion will not know its seed contract
- response packaging will become inconsistent across modules
- experiments will benchmark the wrong interface
For ANDB, the main flow is not documentation after the fact. It is a design artifact.
Receive raw event input and convert it into a validated event envelope that becomes the source of downstream state change.
- HTTP route:
/v1/ingest/events - Gateway implementation:
src/internal/access/gateway.go - Runtime entry:
src/internal/worker/runtime.go
The current runtime ingests schemas.Event, defined in src/internal/schemas/canonical.go.
Typical event types include:
user_messageassistant_messagetool_call_issuedtool_result_returnedplan_updatedcritique_generated
- request reaches the access layer
- request is decoded into an
Event - event is appended to the WAL
- append result produces an LSN / logical sequence
- downstream consumers are notified through the in-memory bus
Today the runtime appends to WAL and immediately feeds the data plane. Full event validation and dedicated materialization workers are still shallow, but the write-first-into-WAL rule is already part of the design.
- persisted event record in the in-memory WAL
- ingest acknowledgment
- trigger point for later materialization/indexing flow
Transform events into canonical objects and version-aware updates.
Events are the source of truth for state change, but query execution should operate over object-centric forms rather than raw event streams alone.
- load event envelope
- determine which object types are affected
- construct canonical objects
- create or update
ObjectVersion - generate typed edges where needed
- persist canonical objects and relation records
user_message/assistant_message→Memory(episodic) +ObjectVersion+belongs_to_session+owned_by_agentedgestool_result_returned→Memory(factual) +ObjectVersion+ causal edgesplan_updated→Memory(procedural) +ObjectVersioncritique_generated→Memory(reflective) +ObjectVersion
materialization.Service.MaterializeEvent(ev) returns a MaterializationResult containing:
Record— theIngestRecordfor the retrieval planeMemory— a canonicalschemas.MemoryobjectVersion— aschemas.ObjectVersionrecordEdges— typed edges inferred from the event (belongs_to_session,owned_by_agent,derived_from)
Runtime.SubmitIngest writes all three canonical records to their stores before feeding the retrieval plane. PreComputeService.Compute then builds an EvidenceFragment and stores it in EvidenceCache.
Current anchor:
Memorypersisted toObjectStoreObjectVersionpersisted toSnapshotVersionStore- typed
Edgerecords persisted toGraphEdgeStore EvidenceFragmentstored inEvidenceCacheIngestRecordfed toTieredDataPlane
Prepare retrievable forms from canonical objects.
Canonical objects represent semantic truth. Retrieval needs dense, sparse, and filterable projections derived from those objects.
- choose retrievable objects
- derive dense representation
- derive sparse/lexical representation
- extract filter attributes
- store retrieval entries in the data plane
MaterializationResult.Record (IngestRecord) is fed to TieredDataPlane.Ingest() which writes to both the hot segment index (for immediate retrieval) and the warm plane. The object ID follows the pattern mem_<event_id> and carries filter attributes:
tenant_id,workspace_id,agent_id,session_id,event_type,visibility
In v1 retrieval is lexical (term-overlap scoring). Dense/vector retrieval is a planned extension.
Current anchor:
src/internal/materialization/service.gosrc/internal/dataplane/tiered_adapter.gosrc/internal/dataplane/segment_adapter.go
- retrieval-ready object IDs
- searchable content representation
- metadata for filtering and namespace partitioning
Accept a structured query request and retrieve candidate evidence seeds.
- HTTP route:
/v1/query - Request type:
schemas.QueryRequest - Response type:
schemas.QueryResponse
Current implementation:
The v1 contract is intended to carry:
- query text
- agent/session context
- scope restrictions
- temporal filters
- object and memory-type filters
- relation expansion constraints
- response mode
- request reaches the query API
- request is decoded into
QueryRequest - runtime calls the embedded data plane
- data plane performs search over segments
- candidate object IDs are returned to response assembly
The current implementation is still lighter than the target contract:
- dense/sparse separation is not explicit yet
- filter application is represented in response notes more than in deep execution
- graph expansion is not yet active
But the contract shape already reserves space for those stages.
- seed object IDs
- scanned segment information
- retrieval path/proof notes for response packaging
Transform retrieved seed objects into a local evidence subgraph through typed relations.
This is where ANDB diverges from ordinary chunk retrieval. Instead of returning only ranked fragments, the system should assemble related objects and edges that explain why the answer is supported.
- accept seed objects from retrieval
- load incoming and outgoing edges
- apply hop, edge-type, scope, and confidence constraints
- assemble a local evidence graph
In v1, expansion is constrained to 1-hop over the GraphEdgeStore.
Assembler.expandEdges(objectIDs) calls GraphEdgeStore.BulkEdges(objectIDs) to load all edges where SrcObjectID or DstObjectID is one of the retrieved object IDs. The result is returned in QueryResponse.Edges and the expansion count is appended to the proof trace as graph_expansion:edges=N.
Edges are populated at ingest time by materialization.Service.MaterializeEvent (belongs_to_session, owned_by_agent, derived_from).
Current anchor:
src/internal/evidence/assembler.gosrc/internal/storage/memory.go—memoryGraphEdgeStore.BulkEdges
Build the final structured response returned to the caller.
The target v1 response includes:
objectsedgesprovenanceversionsapplied_filtersproof_trace
Assembler.Build() assembles a QueryResponse with:
objects— retrieved object IDsedges— 1-hopschemas.Edgerecords fromGraphEdgeStore.BulkEdgesprovenance—["event_projection", "retrieval_projection", "fragment_cache", "graph_expansion"]versions— reserved (shallow in v1)applied_filters— policy filters applied byPolicyEngine.ApplyQueryFiltersproof_trace— tier label + shard trace + pre-computed fragment steps + scanned shards
Pre-computed EvidenceFragment records (built at ingest by PreComputeService) are merged into the proof trace via EvidenceCache.GetMany(objectIDs), amortising chain derivation cost over the ingest path.
Evaluate whether ANDB improves evidence-oriented retrieval over a simpler baseline.
- generate mock events
- ingest them through the public API
- run representative queries
- compare against a top-k-only baseline
- collect retrieval and response metrics
scripts/seed_mock_data.pyscripts/run_demo.pyscripts/benchmark.py- benchmark docs under
docs/experiments
Owns:
- route registration
- request decoding
- public contract exposure
Owns:
- WAL append semantics
- worker subscription path
- ingest/query orchestration
Owns:
- event-to-object transformation
- edge generation
- version handling
Owns:
- retrieval projections
- search execution
- candidate return
Owns:
- relation expansion
- evidence graph assembly
- proof trace packaging
Owns:
- seed scripts
- benchmark loops
- baseline comparison
The following contracts should remain stable unless deliberately reviewed:
- event envelope shape
- canonical object schema
- query request shape
- query response categories
- candidate seed contract between retrieval and graph stages
- edge typing conventions needed for evidence assembly
The following can still vary internally:
- exact storage backend
- embedding backend
- sparse retrieval implementation
- graph storage representation
- in-process versus separated worker execution
As long as the shared contracts stay coherent.
All implementation work should connect back to this path:
ingest -> materialize -> project -> retrieve -> expand -> assemble -> explain -> return
That is the operational skeleton of ANDB v1.