Skip to content

research(context): EDU-based structured context compression (LingoEDU) #1863

@bug-ops

Description

@bug-ops

Summary

Reformulates context compression as structure-then-select: decomposes text into Elementary Discourse Unit (EDU) relation trees, then selects query-relevant subtrees. Eliminates hallucination by anchoring EDUs strictly to source text indices.

Source: arXiv 2512.14244 — From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition
Published 2025-12-16, revised 2026-01-05.

Key Results

  • State-of-the-art structural prediction accuracy on StructBench (248 manually annotated documents)
  • Outperforms frontier LLMs on structured compression while reducing cost
  • Zero hallucination: EDU nodes anchored to source byte offsets

Applicability to Zeph

Current Zeph compaction: summarize_tool_outputs() → token-based chunking → LLM summarization.

Problem: Chunking splits at arbitrary token boundaries, destroying discourse structure. Summaries lose causal/temporal relations between events.

Enhancement: Apply EDU decomposition as a preprocessing step in SemanticMemory::compress() before the compaction LLM call:

  1. Parse tool output / conversation chunk into EDU tree (LingoEDU)
  2. Score each EDU subtree by relevance to current task intent
  3. Select top-K EDU subtrees (respect token budget) → pass to compaction LLM
  4. Compaction LLM sees structured, non-redundant content → better summaries

Synergy: Complements #1851 (SWE-Pruner goal-guided pruning) and #1607 (structured anchored summarization). EDU tree provides the structure that #1607 assumes.

Implementation Sketch

  • EduDecomposer trait in zeph-memory::compression
  • Initial implementation: regex-based clause splitting (lightweight, no dependency)
  • Advanced: integrate LingoEDU parser (Python subprocess or Rust port)
  • Config: [memory.compression] edu_decomposition = false (opt-in, experimental)

Metadata

Metadata

Assignees

No one assigned

    Labels

    researchResearch-driven improvement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions