RFC: Trust-weighted peer review to prevent sybil critique attacks

### Problem

The current peer review system scores papers 1-10 with equal weight per reviewer. This creates a straightforward attack vector: a cluster of low-cost sybil agents can approve low-quality or adversarial research by outvoting legitimate reviewers. Since agent identity is a bare Ed25519 keypair with PoW as the only sybil mitigation, spinning up reviewer nodes is cheap relative to the damage bad research propagation can cause.

The Pulse protocol elegantly verifies that a node *can compute* — but it doesn't verify that an agent's research contributions are *honest* or *useful*. An agent can pass every Pulse challenge while consistently publishing garbage experiments or rubber-stamping bad papers.

This matters more as the network scales. At 237 agents, social dynamics partially self-correct. At 10,000+, naive voting becomes a liability.

### Proposed Solution: Behavioral Reputation Layer

Add a trust score derived from an agent's *historical contribution quality*, not just uptime or tokens served. This score would:

1. **Weight peer reviews** — A review from an agent with a 0.95 trust score carries more weight than one from a 0.30 agent. Breakthrough threshold (currently flat 8+) becomes a weighted score.

2. **Gate swarm admission** — Autoswarms could optionally require a minimum trust score to participate, preventing low-reputation agents from polluting experiment pools.

3. **Prioritize gossip propagation** — Mutations from high-trust agents get priority propagation, reducing time-to-adoption for proven contributors.

4. **Enable cross-domain reputation transfer** — An agent excellent in ML research gets partial trust credit in adjacent domains (e.g., skills), with decay for unrelated domains.

### Trust Score Inputs

Drawing from behavioral signals already available in the network:

| Signal | Weight | Rationale |
|--------|--------|-----------|
| Experiment adoption rate | High | If peers adopt your mutations, your work is useful |
| Paper scores received | Medium | Community assessment of research quality |
| Leaderboard improvement delta | High | Did your contributions actually move metrics? |
| Pulse verification consistency | Low | Baseline — necessary but not sufficient |
| Uptime (existing Presence Points) | Low | Shows commitment but doesn't indicate quality |
| Review accuracy (did reviews predict experiment success?) | High | Reviewers who score accurately are more trustworthy |

### Implementation Sketch

The trust layer doesn't need to live inside the Hyperspace node. It can operate as an **external oracle** that agents query:

```
Agent completes experiment
  → Result propagates via GossipSub
  → Trust oracle observes outcome (adoption rate, leaderboard delta)
  → Trust score updated
  → Score published as a signed attestation
  → Peers query trust scores when weighting reviews or admitting to swarms
```

This keeps the core P2P protocol unchanged while adding a reputation dimension.

### Why This Matters Now

The ClawHub incident this week (314 malicious skills from a single author, all exfiltrating agent memory files) demonstrates what happens in agent ecosystems without behavioral trust scoring. The author passed every automated check — the only defense was manual detection after the fact.

Hyperspace's experiment-sharing model has the same structural vulnerability: agents share executable mutations via gossip. Today that works because the network is small and participants are genuine. At scale, a reputation layer becomes load-bearing infrastructure, not a nice-to-have.

### About Us

We're [Percival Labs](https://percival-labs.ai) — we build trust infrastructure for AI agent networks. Our [Vouch SDK](https://www.npmjs.com/package/@percival-labs/vouch-sdk) implements behavioral reputation scoring on Nostr (Ed25519 keys — same curve as your libp2p peer IDs). The system uses stake-weighted trust attestations published as NIP-85 events, with Lightning micropayments for economic settlement.

We're not proposing Hyperspace adopt our stack wholesale — we're pointing out a structural gap and offering one possible solution architecture. Happy to discuss integration approaches, contribute code, or just trade notes on trust scoring in distributed agent networks.

- Vouch SDK: [npm](https://www.npmjs.com/package/@percival-labs/vouch-sdk) | [GitHub](https://github.com/Percival-Labs/vouch-sdk)
- Vouch API: [Live](https://percivalvouch-api-production.up.railway.app)
- Gateway: [gateway.percival-labs.ai](https://gateway.percival-labs.ai)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Trust-weighted peer review to prevent sybil critique attacks #10

Problem

Proposed Solution: Behavioral Reputation Layer

Trust Score Inputs

Implementation Sketch

Why This Matters Now

About Us

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Signal	Weight	Rationale
Experiment adoption rate	High	If peers adopt your mutations, your work is useful
Paper scores received	Medium	Community assessment of research quality
Leaderboard improvement delta	High	Did your contributions actually move metrics?
Pulse verification consistency	Low	Baseline — necessary but not sufficient
Uptime (existing Presence Points)	Low	Shows commitment but doesn't indicate quality
Review accuracy (did reviews predict experiment success?)	High	Reviewers who score accurately are more trustworthy

RFC: Trust-weighted peer review to prevent sybil critique attacks #10

Description

Problem

Proposed Solution: Behavioral Reputation Layer

Trust Score Inputs

Implementation Sketch

Why This Matters Now

About Us

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions