Poker Solver

A research-grade Monte Carlo CFR implementation for computing near-optimal (GTO) strategies in Heads-Up No-Limit Texas Hold'em.

Overview

This solver uses Monte Carlo Counterfactual Regret Minimization (MCCFR) with advanced optimizations (CFR+, Linear CFR) to compute equilibrium strategies for heads-up no-limit hold'em. It features sophisticated card and action abstractions, parallel training, and empirical exploitability evaluation.

Key Features

Advanced CFR Variants: CFR+ provides ~100x faster convergence than vanilla CFR, with Linear CFR adding an additional 2-3x speedup
Suit Isomorphism Card Abstraction: Reduces state space by 12-19x while preserving strategic relevance (flush draws, suit coordination)
Node-Template Action Model: Context-aware preflop/postflop sizing with SPR-gated jam logic
Realtime Subgame Resolver: Runtime local re-solving with configurable depth, rollout leaves, and conservative blueprint blending
Parallel Training: Multi-core support with lock-free shared memory for efficient scaling
Comprehensive Evaluation: Rollout-based exploitability estimation with confidence intervals
Production-Ready Checkpointing: Efficient Zarr-based storage with resume capability
Interactive CLI & Web UI: Train, evaluate, and visualize strategies through intuitive interfaces

Quick Start

Installation

# Clone the repository
git clone https://github.com/matteo-psnt/poker-solver.git
cd poker-solver

# Install dependencies with uv
uv sync --group dev

Basic Usage

# Launch the interactive CLI
uv run poker-solver

From the CLI, you can:

Train a new solver with predefined configurations
Resume training from checkpoints
Evaluate trained strategies (exploitability estimation)
View preflop GTO charts
Precompute custom card abstractions

Training Your First Solver

Launch the CLI: uv run poker-solver
Select "Train Solver"
Choose a configuration:
- quick_test: Fast convergence test (~2 minutes, ~500 mbb/g)
- production: Balanced quality (~2-3 hours, ~10-20 mbb/g)
Training runs with live progress updates and automatic checkpointing

Architecture

The codebase follows a layered package layout with a single allowed dependency direction:

src/interfaces -> src/pipeline -> src/engine -> src/core
src/shared is layer-neutral and can be imported by all layers
Reverse imports across layers are forbidden

Card Abstraction: Suit Isomorphism

The solver uses combo-level abstraction that preserves suit relationships to the board:

A♠K♠ on T♠9♠8♣ (flush draw) → Different bucket than
A♠K♠ on T♥9♥8♣ (no flush draw)

This is a significant improvement over naive 169-class abstractions that ignore suit coordination.

Process:

Canonicalize boards by suit order (22,100 → 1,755 unique flops)
Cluster boards by texture (connectivity, pairing, suits)
Bucket hands within clusters by equity distributions
Result: 12-19x state space reduction with minimal strategic loss

See Card Abstraction README for details.

Configuration

Training behavior is controlled by YAML configs in config/training/:

# config/training/production.yaml
game:
  starting_stack: 200  # BB units
  small_blind: 1
  big_blind: 2

action_model:
  preflop_templates:
    sb_first_in: ["fold", "call", 2.5, 3.5, 5.0]
    bb_vs_open: ["fold", "call", "3.5x_open", "4.5x_open"]
    sb_vs_3bet: ["fold", "call", "2.3x_last", "jam"]
  postflop_templates:
    first_aggressive: [0.33, 0.66, 1.25]
    facing_bet: ["min_raise", "pot_raise", "jam"]
    after_one_raise: ["pot_raise", "jam"]
    after_two_raises: ["jam"]
  jam_spr_threshold: 2.0
  off_tree_mapping: "probabilistic"

resolver:
  enabled: true
  time_budget_ms: 300
  max_depth: 2
  max_raises_per_street: 5
  leaf_rollouts: 8
  leaf_use_average_strategy: true
  policy_blend_alpha: 0.35
  min_strategy_prob: 1.0e-6

solver:
  cfr_plus: true         # 100x faster convergence
  iteration_weighting: "linear"  # "none" | "linear" | "dcfr"
  sampling_method: "external"  # or "outcome"

training:
  num_iterations: 1000000  # 1M iterations
  checkpoint_frequency: 100000

Runtime Resolver (Realtime Search)

MCCFRSolver.act() can use a realtime HU resolver instead of sampling directly from the blueprint.

Builds a depth-limited local lookahead tree from the current state
Estimates leaf values via blueprint rollouts
Computes a local strategy and blends it with blueprint policy (policy_blend_alpha)
Applies a minimum strategy floor (min_strategy_prob) before normalization
Uses off-tree translation (off_tree_mapping) to improve robustness

Training still learns the blueprint policy; realtime resolving is used at decision time.

Card abstraction configs live in config/abstraction/:

# config/abstraction/default.yaml
board_clusters:
  flop: 50
  turn: 100
  river: 200

buckets:
  flop: 50
  turn: 100
  river: 200

equity_samples: 1000

See Configuration Guide for details on adding custom configs.

Evaluation Metrics

Exploitability

The primary quality metric is exploitability: the expected value a best-response opponent can achieve.

Target Values (in milli-big-blinds per game):

< 1 mbb/g: Strong player
1-5 mbb/g: Good player
5-20 mbb/g: Decent player
20+ mbb/g: Needs more training

Implementation: Monte Carlo rollout-based best response approximation (following Brown & Sandholm 2019). This is tractable for large games but provides empirical estimates rather than exact exploitability.

results = compute_exploitability(
    solver,
    num_samples=10000,          # Game simulations per player
    num_rollouts_per_infoset=100,  # Rollouts for action value estimation
    use_average_strategy=True
)

# Output includes confidence intervals
print(f"{results['exploitability_mbb']:.2f} ± {results['std_error_mbb']:.2f} mbb/g")
print(f"95% CI: [{results['confidence_95_mbb'][0]:.2f}, {results['confidence_95_mbb'][1]:.2f}]")

See Evaluation README for methodology and best practices.

Development

Running Tests

# Run all tests
uv run pytest

# Run fast tests only (excludes slow integration tests)
uv run pytest -m "not slow"

Code Quality

# Linting and formatting
uv run ruff check .
uv run ruff format .

# Type checking
uv run ty check

# Layering/architecture contracts
uv run lint-imports

Chart Viewer Backend

The preflop chart viewer now serves data through FastAPI.

FastAPI server (/health, /api/meta, /api/chart) + static UI from ui/dist

Project Structure

poker-solver/
├── src/
│   ├── interfaces/      # User-facing entrypoints (CLI, API, charts)
│   ├── pipeline/        # Training, evaluation, abstraction workflows
│   ├── engine/          # Solver/search internals
│   ├── core/            # Poker domain foundations (game/actions)
│   └── shared/          # Cross-layer utilities (config, helpers)
├── tests/               # Mirrors src/ layout + integration tests
├── config/
│   ├── training/        # Training configuration presets
│   └── abstraction/     # Card abstraction presets
├── data/
│   ├── runs/            # Training runs and checkpoints
│   └── combo_abstraction/  # Precomputed card abstractions
└── ui/                  # React web interface for charts

License

MIT License - see LICENSE file for details.

Note: This is a research implementation for educational purposes. While the solver computes theoretically sound strategies, it should not be used for real-money poker without extensive additional testing and validation.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
config		config
data/combo_abstraction		data/combo_abstraction
src		src
tests		tests
ui		ui
.gitattributes		.gitattributes
.gitignore		.gitignore
.importlinter		.importlinter
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
claude.md		claude.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Poker Solver

Overview

Key Features

Quick Start

Installation

Basic Usage

Training Your First Solver

Architecture

Card Abstraction: Suit Isomorphism

Configuration

Runtime Resolver (Realtime Search)

Evaluation Metrics

Exploitability

Development

Running Tests

Code Quality

Chart Viewer Backend

Project Structure

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Poker Solver

Overview

Key Features

Quick Start

Installation

Basic Usage

Training Your First Solver

Architecture

Card Abstraction: Suit Isomorphism

Configuration

Runtime Resolver (Realtime Search)

Evaluation Metrics

Exploitability

Development

Running Tests

Code Quality

Chart Viewer Backend

Project Structure

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages