A research-grade Monte Carlo CFR implementation for computing near-optimal (GTO) strategies in Heads-Up No-Limit Texas Hold'em.
This solver uses Monte Carlo Counterfactual Regret Minimization (MCCFR) with advanced optimizations (CFR+, Linear CFR) to compute equilibrium strategies for heads-up no-limit hold'em. It features sophisticated card and action abstractions, parallel training, and empirical exploitability evaluation.
- Advanced CFR Variants: CFR+ provides ~100x faster convergence than vanilla CFR, with Linear CFR adding an additional 2-3x speedup
- Suit Isomorphism Card Abstraction: Reduces state space by 12-19x while preserving strategic relevance (flush draws, suit coordination)
- Node-Template Action Model: Context-aware preflop/postflop sizing with SPR-gated jam logic
- Realtime Subgame Resolver: Runtime local re-solving with configurable depth, rollout leaves, and conservative blueprint blending
- Parallel Training: Multi-core support with lock-free shared memory for efficient scaling
- Comprehensive Evaluation: Rollout-based exploitability estimation with confidence intervals
- Production-Ready Checkpointing: Efficient Zarr-based storage with resume capability
- Interactive CLI & Web UI: Train, evaluate, and visualize strategies through intuitive interfaces
# Clone the repository
git clone https://github.com/matteo-psnt/poker-solver.git
cd poker-solver
# Install dependencies with uv
uv sync --group dev# Launch the interactive CLI
uv run poker-solverFrom the CLI, you can:
- Train a new solver with predefined configurations
- Resume training from checkpoints
- Evaluate trained strategies (exploitability estimation)
- View preflop GTO charts
- Precompute custom card abstractions
- Launch the CLI:
uv run poker-solver - Select "Train Solver"
- Choose a configuration:
quick_test: Fast convergence test (~2 minutes, ~500 mbb/g)production: Balanced quality (~2-3 hours, ~10-20 mbb/g)
- Training runs with live progress updates and automatic checkpointing
The codebase follows a layered package layout with a single allowed dependency direction:
src/interfaces->src/pipeline->src/engine->src/coresrc/sharedis layer-neutral and can be imported by all layers- Reverse imports across layers are forbidden
The solver uses combo-level abstraction that preserves suit relationships to the board:
- A♠K♠ on T♠9♠8♣ (flush draw) → Different bucket than
- A♠K♠ on T♥9♥8♣ (no flush draw)
This is a significant improvement over naive 169-class abstractions that ignore suit coordination.
Process:
- Canonicalize boards by suit order (22,100 → 1,755 unique flops)
- Cluster boards by texture (connectivity, pairing, suits)
- Bucket hands within clusters by equity distributions
- Result: 12-19x state space reduction with minimal strategic loss
See Card Abstraction README for details.
Training behavior is controlled by YAML configs in config/training/:
# config/training/production.yaml
game:
starting_stack: 200 # BB units
small_blind: 1
big_blind: 2
action_model:
preflop_templates:
sb_first_in: ["fold", "call", 2.5, 3.5, 5.0]
bb_vs_open: ["fold", "call", "3.5x_open", "4.5x_open"]
sb_vs_3bet: ["fold", "call", "2.3x_last", "jam"]
postflop_templates:
first_aggressive: [0.33, 0.66, 1.25]
facing_bet: ["min_raise", "pot_raise", "jam"]
after_one_raise: ["pot_raise", "jam"]
after_two_raises: ["jam"]
jam_spr_threshold: 2.0
off_tree_mapping: "probabilistic"
resolver:
enabled: true
time_budget_ms: 300
max_depth: 2
max_raises_per_street: 5
leaf_rollouts: 8
leaf_use_average_strategy: true
policy_blend_alpha: 0.35
min_strategy_prob: 1.0e-6
solver:
cfr_plus: true # 100x faster convergence
iteration_weighting: "linear" # "none" | "linear" | "dcfr"
sampling_method: "external" # or "outcome"
training:
num_iterations: 1000000 # 1M iterations
checkpoint_frequency: 100000MCCFRSolver.act() can use a realtime HU resolver instead of sampling directly from the blueprint.
- Builds a depth-limited local lookahead tree from the current state
- Estimates leaf values via blueprint rollouts
- Computes a local strategy and blends it with blueprint policy (
policy_blend_alpha) - Applies a minimum strategy floor (
min_strategy_prob) before normalization - Uses off-tree translation (
off_tree_mapping) to improve robustness
Training still learns the blueprint policy; realtime resolving is used at decision time.
Card abstraction configs live in config/abstraction/:
# config/abstraction/default.yaml
board_clusters:
flop: 50
turn: 100
river: 200
buckets:
flop: 50
turn: 100
river: 200
equity_samples: 1000See Configuration Guide for details on adding custom configs.
The primary quality metric is exploitability: the expected value a best-response opponent can achieve.
Target Values (in milli-big-blinds per game):
< 1 mbb/g: Strong player1-5 mbb/g: Good player5-20 mbb/g: Decent player20+ mbb/g: Needs more training
Implementation: Monte Carlo rollout-based best response approximation (following Brown & Sandholm 2019). This is tractable for large games but provides empirical estimates rather than exact exploitability.
results = compute_exploitability(
solver,
num_samples=10000, # Game simulations per player
num_rollouts_per_infoset=100, # Rollouts for action value estimation
use_average_strategy=True
)
# Output includes confidence intervals
print(f"{results['exploitability_mbb']:.2f} ± {results['std_error_mbb']:.2f} mbb/g")
print(f"95% CI: [{results['confidence_95_mbb'][0]:.2f}, {results['confidence_95_mbb'][1]:.2f}]")See Evaluation README for methodology and best practices.
# Run all tests
uv run pytest
# Run fast tests only (excludes slow integration tests)
uv run pytest -m "not slow"# Linting and formatting
uv run ruff check .
uv run ruff format .
# Type checking
uv run ty check
# Layering/architecture contracts
uv run lint-importsThe preflop chart viewer now serves data through FastAPI.
- FastAPI server (
/health,/api/meta,/api/chart) + static UI fromui/dist
poker-solver/
├── src/
│ ├── interfaces/ # User-facing entrypoints (CLI, API, charts)
│ ├── pipeline/ # Training, evaluation, abstraction workflows
│ ├── engine/ # Solver/search internals
│ ├── core/ # Poker domain foundations (game/actions)
│ └── shared/ # Cross-layer utilities (config, helpers)
├── tests/ # Mirrors src/ layout + integration tests
├── config/
│ ├── training/ # Training configuration presets
│ └── abstraction/ # Card abstraction presets
├── data/
│ ├── runs/ # Training runs and checkpoints
│ └── combo_abstraction/ # Precomputed card abstractions
└── ui/ # React web interface for charts
MIT License - see LICENSE file for details.
Note: This is a research implementation for educational purposes. While the solver computes theoretically sound strategies, it should not be used for real-money poker without extensive additional testing and validation.