Skip to content

RFC: Trust-weighted peer review to prevent sybil critique attacks #10

@realpercivallabs

Description

@realpercivallabs

Problem

The current peer review system scores papers 1-10 with equal weight per reviewer. This creates a straightforward attack vector: a cluster of low-cost sybil agents can approve low-quality or adversarial research by outvoting legitimate reviewers. Since agent identity is a bare Ed25519 keypair with PoW as the only sybil mitigation, spinning up reviewer nodes is cheap relative to the damage bad research propagation can cause.

The Pulse protocol elegantly verifies that a node can compute — but it doesn't verify that an agent's research contributions are honest or useful. An agent can pass every Pulse challenge while consistently publishing garbage experiments or rubber-stamping bad papers.

This matters more as the network scales. At 237 agents, social dynamics partially self-correct. At 10,000+, naive voting becomes a liability.

Proposed Solution: Behavioral Reputation Layer

Add a trust score derived from an agent's historical contribution quality, not just uptime or tokens served. This score would:

  1. Weight peer reviews — A review from an agent with a 0.95 trust score carries more weight than one from a 0.30 agent. Breakthrough threshold (currently flat 8+) becomes a weighted score.

  2. Gate swarm admission — Autoswarms could optionally require a minimum trust score to participate, preventing low-reputation agents from polluting experiment pools.

  3. Prioritize gossip propagation — Mutations from high-trust agents get priority propagation, reducing time-to-adoption for proven contributors.

  4. Enable cross-domain reputation transfer — An agent excellent in ML research gets partial trust credit in adjacent domains (e.g., skills), with decay for unrelated domains.

Trust Score Inputs

Drawing from behavioral signals already available in the network:

Signal Weight Rationale
Experiment adoption rate High If peers adopt your mutations, your work is useful
Paper scores received Medium Community assessment of research quality
Leaderboard improvement delta High Did your contributions actually move metrics?
Pulse verification consistency Low Baseline — necessary but not sufficient
Uptime (existing Presence Points) Low Shows commitment but doesn't indicate quality
Review accuracy (did reviews predict experiment success?) High Reviewers who score accurately are more trustworthy

Implementation Sketch

The trust layer doesn't need to live inside the Hyperspace node. It can operate as an external oracle that agents query:

Agent completes experiment
  → Result propagates via GossipSub
  → Trust oracle observes outcome (adoption rate, leaderboard delta)
  → Trust score updated
  → Score published as a signed attestation
  → Peers query trust scores when weighting reviews or admitting to swarms

This keeps the core P2P protocol unchanged while adding a reputation dimension.

Why This Matters Now

The ClawHub incident this week (314 malicious skills from a single author, all exfiltrating agent memory files) demonstrates what happens in agent ecosystems without behavioral trust scoring. The author passed every automated check — the only defense was manual detection after the fact.

Hyperspace's experiment-sharing model has the same structural vulnerability: agents share executable mutations via gossip. Today that works because the network is small and participants are genuine. At scale, a reputation layer becomes load-bearing infrastructure, not a nice-to-have.

About Us

We're Percival Labs — we build trust infrastructure for AI agent networks. Our Vouch SDK implements behavioral reputation scoring on Nostr (Ed25519 keys — same curve as your libp2p peer IDs). The system uses stake-weighted trust attestations published as NIP-85 events, with Lightning micropayments for economic settlement.

We're not proposing Hyperspace adopt our stack wholesale — we're pointing out a structural gap and offering one possible solution architecture. Happy to discuss integration approaches, contribute code, or just trade notes on trust scoring in distributed agent networks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions