-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Reformulates context compression as structure-then-select: decomposes text into Elementary Discourse Unit (EDU) relation trees, then selects query-relevant subtrees. Eliminates hallucination by anchoring EDUs strictly to source text indices.
Source: arXiv 2512.14244 — From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition
Published 2025-12-16, revised 2026-01-05.
Key Results
- State-of-the-art structural prediction accuracy on StructBench (248 manually annotated documents)
- Outperforms frontier LLMs on structured compression while reducing cost
- Zero hallucination: EDU nodes anchored to source byte offsets
Applicability to Zeph
Current Zeph compaction: summarize_tool_outputs() → token-based chunking → LLM summarization.
Problem: Chunking splits at arbitrary token boundaries, destroying discourse structure. Summaries lose causal/temporal relations between events.
Enhancement: Apply EDU decomposition as a preprocessing step in SemanticMemory::compress() before the compaction LLM call:
- Parse tool output / conversation chunk into EDU tree (LingoEDU)
- Score each EDU subtree by relevance to current task intent
- Select top-K EDU subtrees (respect token budget) → pass to compaction LLM
- Compaction LLM sees structured, non-redundant content → better summaries
Synergy: Complements #1851 (SWE-Pruner goal-guided pruning) and #1607 (structured anchored summarization). EDU tree provides the structure that #1607 assumes.
Implementation Sketch
EduDecomposertrait inzeph-memory::compression- Initial implementation: regex-based clause splitting (lightweight, no dependency)
- Advanced: integrate LingoEDU parser (Python subprocess or Rust port)
- Config:
[memory.compression] edu_decomposition = false(opt-in, experimental)