-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Summary
Bayesian per-agent reputation scoring for robust multi-agent coordination, derived from the RAPS architecture. Directly extends Zeph's Thompson Sampling AgentRouter with a principled malice/degradation detection layer.
Source: arXiv 2602.08009 — "Towards Adaptive, Scalable, and Robust Coordination of LLM Agents: A Dynamic Ad-Hoc Networking Perspective" (Li et al., 2026)
Technique (reputation component only)
Each agent/model maintains a Beta-distribution reputation score per peer based on observed behavior (task success, output quality, response time vs. expectation). Agents with degraded reputation are automatically down-weighted in routing without central coordination. Reputation updates are Bayesian: success → α += 1, failure → β += 1 on a Beta(α, β) prior.
This is structurally identical to Thompson Sampling's Beta distribution already used in zeph-llm's router — it is essentially adding a quality/reliability dimension alongside the latency EMA.
Applicability to Zeph
MEDIUM-HIGH. Zeph already uses Thompson Sampling (Beta distribution) in AgentRouter for model selection exploration. The Bayesian reputation layer is an extension of the same math applied to output quality:
- Extend
ModelStatsin the router with areputation: Betafield tracking quality outcomes (not just latency) - Feed tool execution errors, LLM parse failures, and plan failures back as reputation signals
- Route away from degraded models/agents proportionally to reputation score decay
- Integrates with
#1841(Agent Stability Index) — ASI provides the coherence signal; reputation tracks the cumulative outcome history
Implementation sketch
- Extend
ModelStatswithquality_alpha: f64,quality_beta: f64(Beta parameters) - On tool execution / plan step: record success/failure → update quality params
- Routing score: combine EMA latency weight with reputation sample
Beta(α,β).sample() - New config key:
[routing.reputation] enabled = false, decay_factor = 0.95