Skip to content

Add opt-in Thompson Sampling for relay scoring#53

Open
alltheseas wants to merge 5 commits intocoracle-social:masterfrom
alltheseas:feat/thompson-sampling
Open

Add opt-in Thompson Sampling for relay scoring#53
alltheseas wants to merge 5 commits intocoracle-social:masterfrom
alltheseas:feat/thompson-sampling

Conversation

@alltheseas
Copy link
Copy Markdown

@alltheseas alltheseas commented Mar 5, 2026

Summary

  • Add sampleBeta(alpha, beta, rng?) to @welshman/lib — Beta-distributed sampling for Thompson bandit relay selection
  • Add getRelayPrior option to RouterOptions — when provided, scoreRelay uses Beta sampling instead of Math.random(), biasing toward relays with better delivery history
  • Extend RelayStats with optional alpha/beta/last_delivery_update fields (backwards-compatible with existing IndexedDB data)
  • Add getRelayPrior(url) and recordRelayDelivery(url, delivered, expected) to @welshman/app relay stats
  • Wire getRelayPrior into routerContext alongside getRelayQuality

Motivation

Welshman's Router scores relays with quality * (1 + log(weight)) * Math.random(). The Math.random() factor is stateless — it never learns which relays actually deliver events.

Benchmarks in nostrability/outbox show that replacing Math.random() with sampleBeta(alpha, beta) (Thompson Sampling) improves 1-year event recall by +9pp (~30% → ~39%, 6-profile mean, 10-run validated) after 3–5 learning sessions. The scoring formula structure is unchanged — only the random factor is replaced. See the benchmark results and methodology for details.

Design decisions

  • Opt-in: when no priors exist, scoreRelay produces identical behavior to current Math.random()
  • Per-relay, not per-pubkey-per-relay: scoreRelay doesn't know pubkey context, so priors are global ("is this relay reliable?"). This is simpler and captures the dominant signal.
  • Push API for delivery feedback: recordRelayDelivery is intentionally a push API — callers (e.g. Coracle) must invoke it after observing delivery outcomes (e.g. after EOSE). The Router shouldn't be opinionated about when "delivery" is measured.
  • Time-based decay: exponential decay (0.95/hour) on stored priors prevents ossification without requiring a "session" concept. Decay is applied on both read and write to avoid stale priors snapping back after idle periods. (The decay rate is a design choice, not a benchmarked parameter — the benchmarks used discrete sessions without decay.)
  • Backwards-compatible: alpha, beta, last_delivery_update are optional fields. Existing serialized RelayStats deserialize fine without them.
  • Corrupted data self-heals: NaN/Infinity/negative stored values are sanitized to 1 (uniform) so relays re-enter Thompson learning after upgrade or data corruption.
  • No latency discount: kept out to reduce scope; can follow up.

Test plan

  • pnpm vitest run packages/lib/__tests__/Beta.test.ts — 14 tests covering statistical properties, edge cases, deterministic seeding, uniform fast path, decay-on-write invariant, and router defensive fallback
  • pnpm build — no type errors in lib, router, or app packages
  • Manual review: when no priors exist, scoreRelay produces identical behavior to current Math.random()
  • Review that getRelayPrior returns undefined for relays with no delivery history (no Beta overhead)
  • Review that time-based decay prevents prior ossification (alpha+beta bounded)

🤖 Generated with Claude Code

alltheseas and others added 5 commits March 5, 2026 09:37
Port sampleBeta(alpha, beta, rng?) from nostrability benchmarks.
Uses Jöhnk's algorithm for small params, Marsaglia-Tsang gamma
sampling for larger values. Fast path: sampleBeta(1, 1) returns
rng() directly (zero overhead on cold start / uniform prior).

Includes comprehensive tests for statistical properties, edge
cases, deterministic seeding, and the uniform fast path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add getRelayPrior to RouterOptions. When provided, scoreRelay uses
sampleBeta(alpha, beta) instead of Math.random(), biasing toward
relays with better delivery history. Falls back to uniform random
when no priors exist (identical to current behavior).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend RelayStats with optional alpha/beta/last_delivery_update
fields (backwards-compatible with existing IndexedDB data).

Add getRelayPrior(url) with exponential time-decay (0.95/hour) to
prevent prior ossification. Add recordRelayDelivery(url, delivered,
expected) for callers to report relay delivery outcomes.

Wire getRelayPrior into routerContext alongside getRelayQuality.

Note: recordRelayDelivery is a push API — callers (e.g. Coracle)
must invoke it after observing delivery outcomes (e.g. after EOSE).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Apply time-decay to stored alpha/beta before adding new observations
in recordRelayDelivery. Previously, decay was only applied on read
(getRelayPrior), so after long idle periods a single new observation
would snap stale priors back to full confidence.

Add input validation:
- recordRelayDelivery rejects non-finite and negative delivered
- getRelayPrior validates alpha/beta are finite and positive
- Router scoreRelay catches sampleBeta exceptions, falls back to
  Math.random()

Add integration tests covering decay semantics, the decay-then-update
invariant, and the router's defensive fallback on invalid priors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
decayPrior now sanitizes stored alpha/beta: NaN, Infinity, negative,
or undefined values reset to 1 (uniform). This lets relays with
corrupted legacy data self-heal on the next delivery observation
instead of staying permanently stuck.

Move getRelayPrior call inside the router's try/catch so that a
throwing provider implementation falls back to Math.random() instead
of aborting relay selection.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alltheseas
Copy link
Copy Markdown
Author

Numbers update from nostrability/outbox benchmarks

The motivation section cites:

replacing Math.random() with sampleBeta(alpha, beta) (Thompson Sampling) improves 1-year event recall from 24% → 89% after 2–3 sessions

The 89% figure was inflated by a phase2 cache bug in the benchmark framework (lossy serialization stored the union of event IDs across all relays, inflating S2+ verification). This was fixed in nostrability/outbox#34.

Corrected numbers (from 793 benchmark runs, 10-run variance study, --no-phase2-cache):

Window Stochastic baseline Welshman+Thompson (5 sessions) Absolute Relative
7d 79-90% 84-92% +4-7pp +5-8%
1yr 30% 39% ± 2.7 SE +9pp +30%
3yr 19% 26% +7pp +37%

Suggested replacement for the motivation section:

Benchmarks in nostrability/outbox (793 runs across 6 profiles, 3 time windows) show that replacing Math.random() with sampleBeta(alpha, beta) finds 30% more events at 1 year (30% → 39% recall, +9pp, 10-run validated) and 37% more at 3 years (19% → 26%). At 7 days the baseline is already strong (79-90%), so gains are modest (+5-8%). Per-profile gains range from 0pp to +15pp depending on relay graph diversity. See the corrected benchmark data.

The PR code itself is correct — this is just a docs fix for the motivation section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant