An autonomous trading agent for Polymarket. Claude analyzes markets without seeing the price,
calibrates its own confidence over time, and sizes every position with Kelly criterion.
Thesis · Quick Start · How It Works · Strategies · AI Pipeline · Edge & Sizing · Risk Management · Configuration · Deployment
Caution
Paper mode is the default — virtual capital, live market data, no risk. Live mode is W.I.P and not yet finalized. Start with paper.
Live TUI dashboard — portfolio stats, strategy activity, pipeline feed, recent trades, and streaming logs
Can an LLM find genuine edge in prediction markets, or is it just expensive noise?
This bot is a working experiment. It gives Claude autonomous decision-making power over real markets — then constrains that autonomy with calibration tracking, risk gates, and position sizing math. Every prediction is logged, scored, and used to correct future estimates. The architecture assumes the model is wrong by default and builds accountability into every layer.
|
10 strategies |
Anti-anchoring research |
|
Bayesian update |
Self-calibrating |
|
Dual LLM providers |
Thesis-based position management |
| Requirement | Purpose | Required |
|---|---|---|
| Python 3.12+ | Runtime | Yes |
| Node.js 20+ | Claude CLI host | Yes |
| Claude CLI | AI analysis + validation | Yes |
ANTHROPIC_API_KEY |
Claude API access | Yes |
GOOGLE_API_KEY |
Gemini research provider | Recommended |
| Polymarket credentials | Live trading (CLOB signing) | Live only |
Tip
scripts/setup.sh installs Python deps and generates .env, but does not install Node.js or Claude CLI. Without the CLI, the bot will start successfully but fail on the first AI analysis tick — --dry-run will not catch this.
git clone <repo-url> && cd polymarket-bot
./scripts/setup.sh # creates venv, installs Python deps, generates .env
npm install -g @anthropic-ai/claude-code # Claude CLI -- NOT installed by setup.sh
export ANTHROPIC_API_KEY="sk-ant-..." # or add to .env (sourced by run.sh)python main.py --dry-run # verify setup (loads components, fetches markets, exits)
python main.py # paper trading with TUI dashboard ($1,000 virtual capital)
python main.py --logs # paper trading with streaming logs (no TUI)Note
scripts/setup.sh does not install Claude CLI or check for Node.js (20+ required). Without the CLI, the bot will fail on the first AI analysis — not at startup, so --dry-run will not catch it.
Warning
scripts/run.sh defaults to live mode. Always pass --paper explicitly. Running python main.py directly defaults to the mode in config.yaml (paper) — use this for the safest default behavior.
CLI Reference
| Command | Purpose |
|---|---|
python main.py |
TUI dashboard mode (default) |
python main.py --logs |
Streaming colorized logs, no TUI |
python main.py --dry-run |
Load all components, fetch markets, exit |
python main.py --collect-data |
Snapshot-only mode for building backtest data |
python main.py --mode paper|live |
Override config.yaml mode |
python main.py --config PATH |
Use alternate config file |
./scripts/run.sh --paper |
Run via wrapper (activates venv, sources .env) |
./scripts/ctl.sh up |
Docker: build and start in paper mode |
./scripts/ctl.sh dashboard |
Docker: attach to live TUI (detach: Ctrl+P, Ctrl+Q) |
The bot runs a single asyncio event loop. An asyncio.Queue receives events from three concurrent producers: a REST price poller (every 60s), a WebSocket feed (real-time), and a resolution poller (every 5 min). The EventHandler drains the queue and runs two paths concurrently on each tick.
flowchart TD
A([Event from Queue]) --> B{Event type}
B -- MARKET_UPDATE --> C[Update market state]
B -- TIMER_TICK --> D[Increment tick<br>periodic checkpoint]
B -- MARKET_RESOLVED --> RES[Settle positions at $1/$0<br>backfill Brier scores<br>update knowledge base]
C & D --> FP
subgraph FP["Fast Path — synchronous, every event"]
FP1[Update portfolio prices<br>best_bid valuation] --> FP2[Exit manager<br>scans all positions]
FP2 --> FP3{Exit triggered?}
FP3 -- yes --> FP4[Execute SELL immediately<br>bypass risk gate + validator]
end
FP3 -- no --> G{TIMER_TICK<br>warmup done<br>prev task done?}
FP4 --> G
G -- yes --> SP
G -- no --> WAIT([Wait for next event])
subgraph SP["Slow Path — background asyncio.Task"]
SP1[Run all strategies in parallel] --> SP2[Signal funnel<br>6-stage filter]
SP2 --> SP3[Risk manager<br>11 checks per signal]
SP3 --> SP4{Approved?}
SP4 -- rejected --> SP5([Log rejection])
SP4 -- approved --> SP6{AI analyst<br>source?}
SP6 -- "yes → skip validator" --> SP7["Execute in parallel<br>asyncio.gather"]
SP6 -- no --> SP8{AI validator<br>enabled?}
SP8 -- yes --> SP9[Claude approval gate]
SP9 -- approved --> SP7
SP9 -- rejected --> SP10([AI REJECTED])
SP8 -- "no → execute directly" --> SP7
SP7 --> SP11[Paper or Live executor<br>portfolio update + DB log]
end
G -- "monitor interval<br>elapsed" --> MON
subgraph MON["Position Monitor — background task, every 30 min"]
MON1[Re-evaluate held positions via LLM<br>adaptive frequency by deadline] --> MON2{Verdict}
MON2 -- "thesis valid" --> MON3([Hold])
MON2 -- "extend hold" --> MON4([+24h, max 1 extension])
MON2 -- "thesis invalid" --> MON5[Execute exit immediately]
end
| Path | Behavior |
|---|---|
| Fast path | Synchronous, every tick. Updates portfolio prices using best_bid (not midpoint — that's what you'd actually get on exit). Checks exit conditions: take-profit, edge decay, time expiry, approaching resolution. Exits execute immediately, bypassing risk gate and validator. |
| Slow path | Background asyncio.Task, TIMER_TICK only. Runs all strategies in parallel → 6-stage funnel → 11-check risk gate. AI analyst signals skip the Claude validator (already AI-sourced); other signals pass through the approval gate. |
| Position monitor | Separate background task, every 30 min. Re-evaluates held positions via LLM with adaptive frequency by deadline proximity. Can extend hold time (max 1 extension) or trigger immediate exit on thesis invalidation. Staleness failsafe: if monitor hasn't run in 2 cycles, edge decay re-enables. |
Adaptive re-eval frequency
| Time to Deadline | Re-eval Frequency |
|---|---|
| < 24 hours | Every 10-min cycle |
| 1–3 days | Every 3rd cycle (30 min) |
| 3–7 days | Every 5th cycle (50 min) |
| 7+ days | Every 7th cycle (70 min) |
flowchart LR
subgraph Ingestion["Data Ingestion"]
MM["Market Monitor<br>1,500 markets<br>REST + WebSocket"]
end
subgraph Strategies["Strategy Layer"]
AI["6 AI Analysts<br>politics / crypto / sports<br>econ / tech / general"]
WX["Weather<br>GFS 30-member ensemble"]
CX["Complexity<br>3 structural signals"]
end
subgraph Pipeline["Signal Pipeline"]
SF["Signal Funnel<br>6-stage filter"]
RM["Risk Manager<br>11-check gate"]
AV["AI Validator<br>fail-closed"]
end
subgraph Exec["Execution"]
P["Paper Simulator"]
L["Live CLOB<br>(W.I.P)"]
end
subgraph State["State"]
PORT["Portfolio<br>fee-inclusive<br>cost basis"]
DB["SQLite"]
KB["Knowledge Base<br>per-category"]
end
LLM["Claude / Gemini"] --> AI
LLM --> AV
MM --> AI & WX & CX
AI & WX & CX --> SF --> RM --> AV
AV --> P & L --> PORT --> DB
AV --> KB
Tip
Resilience — Every long-running coroutine runs inside a supervisor with up to 50 restarts and exponential backoff (capped at 5 min). Portfolio checkpoints to SQLite every 10 ticks, surviving crashes and SIGKILL.
Config hot-reload — ConfigManager polls config.yaml every 30 seconds. Risk limits, strategy parameters, and exit profiles update inline — no restart needed. Research provider changes require a restart.
Each analyst inherits from ai_analyst_base.py which handles LLM calls, caching, Platt calibration, and Kelly sizing. Up to five LLM calls per market:
- Research — 3-thread evidence gathering (base rate, current events, structural factors) via web search. Market price is deliberately withheld to prevent anchoring bias.
- Independent estimation — three parallel LLM calls, each with its own context window. No analyst can see another's output. Median probability becomes the estimate; spread drives confidence.
- Reconciliation (conditional) — when ensemble spread exceeds 0.15, a supervisor call identifies the source of disagreement and produces a reconciled estimate. Recovers markets that would otherwise be dropped.
- Bayesian update — the model sees its blind estimate alongside the market price and reasons about whether the deviation is justified. Replaces the old mechanical blend formula.
| Category | Tags | Perspectives | Min Edge | Kelly | Max Hold |
|---|---|---|---|---|---|
| Politics | elections, geopolitics | Historical · Current evidence · Structural | 6% | 0.15 | 48h |
| Crypto | BTC, ETH, SOL, tokens | Technical · Momentum · Sentiment | 4% | 0.10 | 24h |
| Sports | NBA, NFL, MLB, F1, MMA | Statistical · Matchup · Market | 6% | 0.15 | 24h |
| Economics | Fed, inflation, GDP, tariffs | Consensus · Data-driven · Surprise risk | 5% | 0.15 | 48h |
| Tech | AI, launches, semiconductors | Timeline · Technical · Strategic | 5% | 0.15 | 48h |
| General | everything unclaimed | Base-rate · Evidence · Contrarian | 5% | 0.20 | 120h |
Advanced AI features
| Feature | Description |
|---|---|
| Crypto enrichment | Live Binance technicals (RSI-14, SMA-20/50, EMA-12/26, VWAP-24h, funding rates) and Fear & Greed Index injected into the prompt — real data, not hallucination |
| Batch research | Markets sharing the same underlying asset (e.g., "Bitcoin $90k / $95k / $100k") are grouped into a single research call. Reduces LLM calls from N to 1+N |
| Cross-category dedup | Global claim registry (1-hour TTL) prevents multiple categories from analyzing the same market |
| Sibling co-evaluation | Timeframe siblings (same event, different deadlines) pulled into the same tick; funnel keeps the best-scoring variant |
| Cross-market coherence | Already-estimated probabilities from related markets are injected as priors, enforcing consistency across outcome variants |
| Category exclusion | Politics excludes oil/crypto/commodities; Economics excludes crypto. Each market routes to exactly one specialist |
| 3-thread research | Base rate, current events, and structural factors searched in parallel within a single Gemini call — finds evidence single-query misses (e.g., electoral system rules, pollster bias) |
| Supervisor reconciliation | When ensemble spread > 0.15, a supervisor call identifies the source of disagreement and reconciles. Recovers ~30% of markets that would otherwise be dropped |
| Bayesian update | Blind estimate confronted with market price — model reasons about whether its deviation is justified. Replaces mechanical blend formula |
| Timeout backoff | Markets causing LLM timeouts get geometric cooldowns: 15 min → 30 min → 1h → 2h → 4h max |
| Strategy | Type | Description |
|---|---|---|
| Weather | GFS Ensemble | No LLM calls. Open-Meteo 30-member ensemble forecasts. Parse question → geocode → fetch forecast → count members per bucket → trade mispriced outcomes. Supports temp, precip, snowfall, wind. |
| Complexity | Structural Scanner | Zero LLM calls. Three signals: complement spread (informed flow), volume spike (detection-only), resolution proximity (midpoint ~50%, expiry <48h). |
| Contrarian | Mechanical | Disabled by default. Bets against extreme consensus: >95% → NO, <5% → YES. Requires 48h+ to expiry. |
| Sentiment | News-driven | Disabled by default. NewsAPI headlines + LLM impact assessment. 5-min cooldown per token. |
flowchart TD
A["Market candidates<br>volume-sorted, liquid first"] --> B["Phase A: Research<br>Gemini + Google Search grounding<br>or Claude CLI + web search"]
B --> |"price WITHHELD<br>prevents anchoring"| C["Phase B: Independent Estimation<br>3 parallel LLM calls<br>separate context windows"]
C --> REC{"Spread > 0.15?"}
REC -- yes --> RECON["Supervisor Reconciliation<br>identify disagreement source<br>produce reconciled estimate"]
RECON --> D["Platt Scaling<br>correct RLHF underconfidence<br>alpha configurable per strategy"]
REC -- no --> D
D --> BAY["Bayesian Update<br>confront blind estimate with market price<br>model reasons about the gap"]
BAY --> E["Edge Calculation<br>subtract fees + spread + slippage<br>category-aware (see below)"]
E --> F{Edge Validation}
F --> |"net edge > 30%<br>in liquid market"| G["Reject: implausible<br>model error"]
F --> |"edge < min_edge"| H["Insufficient edge<br>not deduped, retried"]
F --> |"stale price<br>0.48-0.52, low vol"| I["Skip: no reliable<br>price signal"]
F --> |"valid edge"| K["Kelly Sizing<br>uncertainty discount<br>+ inventory adjustment"]
K --> J["SIGNAL<br>deduped for 1 hour"]
Dual providers — Gemini 2.5 Flash Lite for 3-thread research (native Google Search grounding), Claude for independent parallel estimation (3 calls, separate context windows). Research uses grounding only — no thinking — combining both is a known Gemini API bug. Automatic Claude CLI fallback on Gemini failure.
Online learning — When a position resolves, Claude extracts one actionable lesson and appends it to data/knowledge/{category}.md. These lessons are injected into future prompts (capped at 100 lines per category).
EdgeStatus dedup — SIGNAL, IMPLAUSIBLE_EDGE, and ZERO_SIZE permanently dedup a market. INSUFFICIENT_EDGE and LOW_CONFIDENCE are intentionally retried — they depend on price movement.
flowchart TD
A["AI estimates probability<br>raw_prob → Platt scaling → blended_prob"] --> B["Log to predictions table<br>raw, calibrated, blended, market_price, edge"]
B --> C["Trade executes<br>edge_status = SIGNAL"]
D["Resolution Poller<br>every 30 min"] --> E{Market resolved?}
E -- no --> D
E -- yes --> F["Backfill outcome<br>1.0 = YES won, 0.0 = NO won"]
F --> G["Compute Brier component<br>(calibrated_prob - outcome)²"]
G --> H{"30+ resolved<br>for this strategy?"}
H -- no --> I["Use default alpha<br>(1.3)"]
H -- yes --> J["Grid-search alpha<br>0.8 to 2.5, step 0.05<br>minimize Brier score"]
J --> K["Update strategy's<br>platt_alpha in memory"]
K --> A
RLHF training makes LLMs systematically under-confident — probabilities cluster toward 50%. Platt scaling amplifies log-odds:
calibrated = sigmoid(alpha * log(p / (1-p))). Default alpha 1.3; auto-tuned once 30+ predictions resolve per strategy.The dashboard shows the aggregate Brier score with quality labels (excellent < 0.10 · good < 0.20 · fair < 0.30 · poor).
Three mathematical stages transform a calibrated probability into a sized trade signal. Each stage is designed to correct for a specific class of error: blend weighting corrects for LLM anchoring risk, edge calculation corrects for transaction costs, and Kelly sizing corrects for estimation uncertainty.
After Platt scaling, the model confronts its blind estimate with the market price and reasons about the gap. This replaces the old mechanical blend formula (weighted average of LLM and market). The model explicitly evaluates whether its deviation from the market is justified by specific information, and adjusts accordingly.
flowchart TD
A["calibrated_prob<br>Platt-scaled median of ensemble"] --> C["Bayesian Update Call"]
B["market_price<br>midpoint from order book"] --> C
C --> D{"Is deviation<br>justified?"}
D -- "YES: specific info<br>market hasn't priced in" --> E["Keep estimate or<br>move slightly toward market"]
D -- "NO: market with this<br>volume likely knows more" --> F["Move substantially<br>toward market price"]
E --> G["adjusted_prob<br>with reasoning"]
F --> G
G --> H["Safety: direction preserved<br>update can't flip the edge"]
The Bayesian update produces the most interpretable reasoning in the whole pipeline. Example: blind estimate 0.63, market 0.34. Old formula: 0.58 (mechanical). Bayesian: 0.44 with reasoning: "over-weighted structural factors, market at this volume prices polling correctly."
Net edge deducts all real-world costs from the raw probability edge — category-aware Polymarket fees, half-spread, and slippage — before comparing to the strategy's minimum edge threshold.
flowchart TD
A["blended_prob"] --> B["raw_edge = blended_prob - market_price"]
B --> C{Category}
C -- "crypto" --> D["entry_fee = p * 0.25 * (p*(1-p))^2<br>exit_fee = ep * 0.25 * (ep*(1-ep))^2"]
C -- "sports" --> E["entry_fee = p * 0.0175 * p*(1-p)<br>exit_fee = ep * 0.0175 * ep*(1-ep)"]
C -- "politics / economics<br>tech / general" --> F["entry_fee = 0<br>exit_fee = 0"]
D & E & F --> G["spread_cost = market_spread / 2"]
G --> H["net_edge = |raw_edge|<br>- spread_cost<br>- entry_fee - exit_fee<br>- slippage_pct"]
H --> I{net_edge < min_edge?}
I -- yes --> J["INSUFFICIENT_EDGE<br>not deduped — retried"]
I -- no --> K{net_edge > 30%?}
K -- yes --> L["IMPLAUSIBLE_EDGE<br>deduped — likely model error"]
K -- no --> M["Kelly sizing →"]
Note
Politics, economics, tech, and general markets have zero fees on Polymarket (2026). Only crypto and sports incur fees, with crypto fees peaking at ~1.56% at the midpoint and sports at ~0.44%. The same fee formulas are applied in both edge calculation and paper trade simulation for consistency.
Kelly criterion sizes each bet proportional to edge, then applies three sequential discounts: an uncertainty penalty from ensemble confidence, a per-strategy fractional Kelly cap, and an inventory adjustment that reduces size as topic exposure grows.
flowchart TD
A["true_prob<br>calibrated, trade-side adjusted"] --> B["b = (1 / price) - 1<br>implied payout odds"]
B --> C{confidence<br>provided?}
C -- yes --> D["uncertainty = clamp((1 - conf) * 0.15, 0.02, 0.08)<br>p = true_prob - uncertainty"]
C -- no --> E["p = true_prob"]
D & E --> F["q = 1 - p<br>kelly = (b*p - q) / b"]
F --> G{kelly <= 0?}
G -- yes --> H["Return $0 — no edge"]
G -- no --> I["fraction = kelly * kelly_fraction<br>0.10 - 0.20 per strategy"]
I --> J["Topic inventory check<br>count positions in same cluster<br>(BTC, Trump, Fed, oil, ...)"]
J --> K["inventory_scale = max(0, 1 - count / max)<br>linear decay to zero"]
K --> L["bet = fraction * inventory_scale * portfolio_value"]
L --> M["Final = min(bet, max_bet_usd)"]
| Discount | Source | Effect |
|---|---|---|
| Uncertainty penalty | Ensemble spread → confidence | Low agreement → subtract 2–8% from probability before Kelly |
| Fractional Kelly | Per-strategy config (0.10–0.20) | Caps bet at 10–20% of full Kelly — reduces variance |
| Inventory adjustment | Open positions in same topic | Linear decay: 3 BTC positions out of max 3 → Kelly multiplied by 0 |
Every signal passes a sequential gate — any single check can reject:
| # | Check | Prevents |
|---|---|---|
| 0 | Halt status | Trading during drawdown circuit breaker |
| 1a | Duplicate guard | Doubling into already-held positions |
| 1b | Max open positions | Exceeding position cap (25) |
| 2 | Liquidity + spread | Zero-volume, crossed, or wide-spread markets |
| 3 | Price sanity | Prices outside 0.01–0.99 |
| 4 | Position size cap | Single position > 2% of portfolio |
| 5 | Strategy budget | Strategy exceeding max_capital_usd ceiling |
| 6 | Daily drawdown | Daily PnL < -10% → halt all trading |
| 7 | Event correlation | Max 3 positions per event |
| 8 | Topic correlation | Max 2 positions per topic (BTC, ETH, Trump, Fed, oil, etc.) |
| 9 | Balance check | Insufficient capital (with $5 buffer) |
| 10 | Deployment pacing | Max 5% deployed per rolling hour |
Important
Position count and loss cooldown are re-checked at execution time (not just risk-check time), closing race conditions where parallel asyncio.gather executions could collectively exceed limits.
flowchart TD
A([BUY order fills]) --> B[Create Position<br>fee-inclusive cost basis]
B --> C[Compute exit params<br>TP price / max hold / edge decay]
C --> D([Position held<br>updated every tick])
D --> E{Exit Manager<br>checks every tick}
E -- "price >= take_profit" --> F[TAKE_PROFIT]
E -- "price drop >30%<br>OR time >60%" --> G[EDGE_DECAYED]
E -- "max hold exceeded" --> H[TIME_EXPIRY]
E -- "near resolution" --> I[APPROACHING_EXPIRY]
E -- "no exit trigger" --> D
D --> J{Position Monitor<br>every 30 min}
J -- "thesis still valid" --> D
J -- "extend hold time" --> K["+24h<br>max 1 extension<br>capped by market deadline"]
K --> D
J -- "thesis invalidated" --> L[THESIS_EXIT]
F & G & H & I & L --> M[Execute SELL<br>bypass risk gate]
M --> N[Calculate realized PnL]
N --> O[Log trade + update DB]
O --> P{Loss?}
P -- yes --> Q[6-hour cooldown<br>on this market]
P -- no --> R[Update knowledge file<br>via Claude]
Binary prediction markets pay $1 or $0 at resolution — interim dips are usually noise, not thesis invalidation. No hard stop-loss or trailing stop. Instead:
| Exit Type | Trigger | Rationale |
|---|---|---|
| Take-profit | 7–14% gain (strategy-dependent) | Prediction markets rarely offer more |
| Edge decayed | Price drops >30% OR >60% of hold elapsed | OR-logic soft stop-loss |
| Time expiry | Max hold exceeded (24h–120h) | Limits capital lock-up |
| Approaching expiry | Near resolution deadline (2h buffer) | Avoid liquidity drought |
| Thesis invalidation | Monitor LLM determines thesis invalid | Fundamental outlook change |
| Monitor staleness | 2 consecutive missed re-evals | Failsafe: re-enables edge decay |
Additional exit safeguards
- Loss cooldown — Same market blocked for 6 hours after a loss. Checked at both strategy selection and execution stages.
- Complement liquidity — NO token buys require complement
best_bid>= $0.05. Illiquid complements are permanently deduped. - Minimum hold — Exit checks suppressed for 60s after entry, preventing same-tick buy/sell oscillation.
All settings live in config.yaml. Changes take effect within 30 seconds via hot-reload — no restart needed (except research provider).
| Section | Key Settings | Default |
|---|---|---|
mode |
"paper" or "live" |
"paper" |
risk.max_position_pct |
Max single position as % of portfolio | 2% |
risk.daily_drawdown_limit_pct |
Daily drawdown halt threshold | 10% |
risk.max_open_positions |
Concurrent position cap | 25 |
risk.max_deployment_per_hour_pct |
Hourly deployment pacing | 5% |
strategies.<name>.min_edge |
Minimum edge to signal | 4–6% |
strategies.<name>.kelly_fraction |
Kelly sizing fraction | 0.10–0.20 |
strategies.<name>.platt_alpha |
Platt scaling alpha (>1 = away from 50%) | 1.3 |
research.research_provider |
"gemini" or "claude" for Phase A |
"gemini" |
research.estimation_provider |
"claude" or "gemini" for Phase B |
"claude" |
strategies.<name>.reconciliation_spread_threshold |
Ensemble spread triggering supervisor reconciliation | 0.15 |
ai_validation.enabled |
Claude approval gate for non-AI signals | true |
position_monitor.max_extensions |
Max hold-time extensions per position | 1 |
paper_trading.initial_balance_usd |
Starting virtual capital | $1,000 |
Environment Variables
| Variable | Required | Purpose |
|---|---|---|
ANTHROPIC_API_KEY |
Yes | Claude CLI (all AI strategies + validator). Must be in the shell environment. |
GOOGLE_API_KEY |
If using Gemini | Gemini research provider. Raises RuntimeError if missing. |
GEMINI_API_KEY |
No | Alternative to GOOGLE_API_KEY (checked second) |
POLYMARKET_PRIVATE_KEY |
Live only | EOA private key for CLOB order signing |
POLYMARKET_FUNDER_ADDRESS |
Live only | Polymarket proxy wallet address |
DATABASE_URL |
No | Override default SQLite path (sqlite:///bot_data.db) |
[!NOTE] OpenAI and NewsAPI keys are read from
config.yaml, not environment variables.
Data Sources
| Source | Used By | Key | Notes |
|---|---|---|---|
| Polymarket Gamma API | Market discovery, resolution | — | Rate-limited 5 req/s |
| Polymarket CLOB API | Order books, execution | Live | Circuit breaker: 5 failures → 60s cooldown |
| Polymarket WebSocket | Real-time bid/ask | — | Auto-reconnect with backoff |
| Claude CLI | Estimation, validation, learning | Yes | Subprocess; max 3 concurrent |
| Gemini 2.5 Flash Lite | Grounded research | Yes | Auto-fallback to Claude |
| Open-Meteo | GFS weather (30 members) | — | Free; 10K req/day |
| Binance API | Crypto technicals, funding | — | 60s cache |
| CoinGecko | Crypto spot prices | — | 60s cache |
| Alternative.me | Fear & Greed Index | — | Crypto analyst |
| NewsAPI | Breaking news | Yes | Sentiment only (disabled) |
./scripts/setup.sh # one-time: venv + Python deps + .env
npm install -g @anthropic-ai/claude-code # one-time: Claude CLI (not in setup.sh)
export ANTHROPIC_API_KEY="sk-ant-..." # or add to .env
./scripts/run.sh --paper # paper trading (ALWAYS pass --paper explicitly)
./scripts/run.sh --paper --logs # streaming logs instead of TUIWarning
scripts/run.sh defaults to live mode (line 8: MODE="live"). If credentials are set in .env, running without --paper will trade real money. Use python main.py directly for the safest default (paper via config.yaml).
cp .env.example .env # fill in API keys
./scripts/ctl.sh up # build and start (paper mode by default)
./scripts/ctl.sh dashboard # attach to live TUI
./scripts/ctl.sh logs bot # tail log file
./scripts/ctl.sh down # stop everythingTwo containers: bot (trading engine + TUI) and collector (background snapshots for backtesting). Data persists in a Docker volume at /app/data. The image includes Node.js 20 and Claude CLI automatically.
All Docker commands
| Command | Purpose |
|---|---|
./scripts/ctl.sh up |
Build and start bot + collector (paper mode) |
./scripts/ctl.sh up --live |
Start in live mode (credential check first) |
./scripts/ctl.sh down |
Stop all services |
./scripts/ctl.sh status |
Show container status |
./scripts/ctl.sh dashboard |
Attach to TUI — detach: Ctrl+P, Ctrl+Q |
./scripts/ctl.sh logs bot|collector |
Tail container logs |
./scripts/ctl.sh backtest --start DATE --end DATE |
Run backtest in container |
./scripts/ctl.sh build |
Rebuild images |
./scripts/ctl.sh restart |
Restart all services |
python main.py --collect-data # collect snapshots
python -m backtest.runner --start 2025-01-01 --end 2025-03-01 # replay through pipeline
python -m backtest.walk_forward --start 2025-01-01 --end 2025-03-01 # rolling optimizationMetrics: win rate · PnL · profit factor · expectancy · max drawdown · Sharpe · Sortino.
Full file tree
polymarket-bot/
├── main.py # Entry point, supervisor restart loops, daily reset
├── config.yaml # All settings (hot-reloadable, 30s poll)
├── core/
│ ├── event_handler.py # Tick loop, fast/slow path split, position monitor
│ ├── market_monitor.py # REST + WebSocket + cross-book price sync
│ ├── portfolio.py # Positions, PnL, fee-inclusive cost basis (best_bid valuation)
│ ├── exit_manager.py # Take-profit, edge decay, expiry, thesis exits
│ ├── signal_funnel.py # 6-stage filter: confidence → cap → dedup → rank → global cap
│ ├── ai_validator.py # Claude approval gate (fail-closed) + knowledge learning
│ ├── calibration_tracker.py # Prediction logging, Brier scores, Platt alpha auto-tuning
│ ├── config_manager.py # Hot-reload config watcher (30s mtime poll)
│ ├── models.py # Signal, Order, Position, MarketState dataclasses
│ └── resilience.py # CircuitBreaker, RateLimiter, retry_with_backoff
├── strategies/
│ ├── base.py # BaseStrategy + StrategyEngine (3-phase parallel dispatch)
│ ├── ai_analyst_base.py # Shared AI logic: cache, Kelly, Platt, claims, dedup, thesis
│ ├── ai_analyst.py # 6 category subclasses with specialized prompts
│ ├── gemini_research.py # Gemini provider with Google Search grounding
│ ├── market_grouper.py # Batch research grouping by underlying asset (regex)
│ ├── weather.py # GFS ensemble forecasting (no LLM)
│ ├── open_meteo.py # Geocoding + ensemble API client
│ ├── complexity.py # Market structure scanner (3 structural signals, no LLM)
│ ├── contrarian.py # Extreme consensus reversal strategy
│ ├── sentiment.py # News-driven LLM impact assessment
│ ├── crypto_data.py # Binance technicals + Fear & Greed
│ ├── crypto_prices.py # CoinGecko price fetcher
│ ├── signal_funnel.py # Per-category signal caps before risk manager
│ └── calibration.py # Platt scaling function
├── risk/
│ └── manager.py # 11-check sequential risk gate
├── execution/
│ ├── executor.py # Routes to paper or live
│ ├── paper.py # Paper simulator (Polymarket fee formula + slippage)
│ └── live.py # CLOB API (EIP-712 signed orders on Polygon) — W.I.P
├── dashboard/
│ ├── cli.py # Textual TUI (positions, pipeline feed, calibration)
│ └── metrics.py # Pipeline stage tracking (17 stages, TTL eviction)
├── learning/
│ └── analyze.py # Daily analysis + auto-parameter adjustment
├── backtest/
│ ├── runner.py # Full pipeline replay engine
│ ├── walk_forward.py # Walk-forward parameter optimization
│ ├── data_collector.py # Background market snapshots
│ └── metrics.py # Sharpe, Sortino, max drawdown, profit factor
├── db/
│ └── schema.py # SQLAlchemy models (6 tables) + auto-migration on startup
├── data/
│ ├── knowledge/ # Per-category knowledge files (evolving at runtime)
│ └── research/ # LLM forecasting research notes
├── scripts/
│ ├── setup.sh # Install deps, configure environment
│ ├── run.sh # Start bot (--paper or --live)
│ ├── ctl.sh # Docker control (up, down, dashboard, logs)
│ ├── market_analyzer.py # Full-universe market analysis (standalone)
│ └── live_trade.py # Interactive live trade script
├── tests/
│ └── test_ai_validator.py # Validator unit tests
├── Dockerfile # Multi-stage build, Node.js 20 + Claude CLI, non-root user
├── docker-compose.yml # bot + collector services, named volume
└── requirements.txt
Database Schema (6 tables)
| Table | Purpose | Key Fields |
|---|---|---|
trades |
Audit log of every fill | token_id, strategy, side, price, shares, pnl, outcome |
predictions |
Every AI estimate | condition_id, raw_prob, calibrated_prob, edge, brier_component |
daily_snapshots |
End-of-day summary | total_value, daily_pnl, strategy_breakdown (JSON) |
position_snapshots |
Crash recovery | token_id, shares, avg_entry_price, extensions_used |
portfolio_state_snapshots |
Capital state | available_capital, daily_start_value, realized_pnl |
errors |
Error log | timestamp, error_message, strategy, context |
Portfolio state uses SQLite savepoints for atomic writes — crashes mid-checkpoint roll back cleanly.
Common issues
| Problem | Cause | Fix |
|---|---|---|
FileNotFoundError on first AI analysis |
Claude CLI not installed | npm install -g @anthropic-ai/claude-code |
--dry-run passes but bot fails on first tick |
--dry-run doesn't spawn Claude CLI |
Install CLI and set ANTHROPIC_API_KEY |
RuntimeError: No Gemini API key found |
Config uses Gemini but no key | Set GOOGLE_API_KEY or switch to claude |
| TUI not rendering | Missing rich or textual |
pip install rich textual |
| Docker: knowledge files missing | data/ excluded by .dockerignore |
Mount data/knowledge/ as volume |
./scripts/run.sh trades real money |
Defaults to live mode | Always pass --paper |
| WebSocket keeps disconnecting | Network issues / API downtime | Auto-reconnect; REST poller as fallback |
| Bot runs but never trades | All strategies disabled or no markets match tag filter | Check strategies.<name>.enabled and polymarket.relevant_tags in config.yaml |
| AI validator blocks all trades | 3 consecutive Claude CLI errors → fail-closed | Check ANTHROPIC_API_KEY, Claude CLI installation, subprocess availability |
| Config changes have no effect | Changed research_provider (requires restart) |
Most settings hot-reload; provider changes need a restart |
google-genai not installed |
research_provider: "gemini" but package missing |
pip install google-genai |
| Limitation | Details |
|---|---|
| No hard stop-loss | Intentional for binary markets ($1/$0), but positions can drawdown before edge decay triggers |
| Calibration cold start | Auto-tune requires 30+ resolved predictions per strategy; uses default alpha (1.3) until then |
| Gemini thinking + grounding | Cannot be combined — research uses grounding only, estimation uses thinking only |
| Docker knowledge gap | .dockerignore excludes data/; fresh deployments start without accumulated knowledge |
| Live mode is W.I.P | execution/live.py exists but is not battle-tested — paper mode is the only fully validated execution path |
| Single-node only | No distributed mode, Kubernetes, or cloud templates |
This is experimental software. Paper mode uses virtual capital — no financial risk. Live mode executes real trades on Polygon and can lose real money. Not financial advice.
MIT License
