A Formula 1 race prediction tool that uses historical data and machine learning to forecast qualifying and race results.
Note: This project was built with significant assistance from AI (GitHub Copilot / Claude). The codebase, documentation, and overall architecture were developed collaboratively with AI tools.
This tool predicts finishing positions for F1 sessions:
- Qualifying – Grid positions
- Race – Final standings
- Sprint Qualifying – Sprint grid
- Sprint – Sprint race results
It pulls data from public APIs, builds features from historical performance, and trains models fresh on each run—no saved weights, fully self-calibrating.
# Set up environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
# Predict the next race
python main.py --round next
# Predict a specific past event
python main.py --season 2024 --round 5 --sessions qualifying raceThe app fetches data from free, public APIs:
- Jolpica F1 – Schedules, results, standings (Ergast-compatible)
- Open-Meteo – Weather forecasts and historical weather
- OpenF1 – Session timing data (historical only)
- FastF1 – Detailed timing and telemetry fallback
- Roster inference – For future races, the entry list comes from the most recent completed event
- Feature engineering – Driver form, team performance, weather conditions, teammate comparisons, starting grid
- Grid position handling – Uses actual grid from race results when available; for pre-race predictions, runs a qualifying simulation to estimate starting positions
- Model training – Gradient boosting (LightGBM/XGBoost/sklearn) trained on historical data
- DNF estimation – Separate classifier for retirement probability
- Monte Carlo simulation – 5000 draws to get win probability, podium chances, and expected position
When predicting a race before qualifying has occurred:
- The system first runs a full qualifying prediction
- Uses predicted qualifying positions as the starting grid
- Feeds this grid into the race prediction model
This allows accurate race predictions even before the grid is known, and accounts for grid penalties and unexpected qualifying results when actual data is available.
All settings live in config.yaml. The main things you might want to tweak:
modelling:
recency_half_life_days:
base: 120 # How quickly old results fade in importance
weather: 180 # Weather skill memory
team: 240 # Team performance memory
monte_carlo:
draws: 5000 # Simulation iterations (more = slower but smoother)
data_sources:
open_meteo:
temperature_unit: "celsius" # or fahrenheit
windspeed_unit: "kmh" # kmh, ms, mph, knTerminal output only – Predictions display directly in the console with:
- Driver names and teams
- Predicted positions
- Win probability, podium probability, DNF probability
- Weather conditions
- Position changes when actual results are available
python main.py --season 2024 --round 10Re-runs predictions periodically and updates when results come in:
python main.py --round next --live --refresh 30Evaluate model accuracy across historical seasons:
python main.py --backtest- No real-time data – OpenF1 is used for historical data only, not live timing
- Weather is approximate – Forecasts are aggregated around session windows
- DNF model is basic – Uses historical base rates, not detailed reliability analysis
- First race of season – Limited data for brand new driver/team combinations
- Grid penalties – Only reflected if race results are available (post-qualifying predictions may not account for all penalties)
f1pred/
├── predict.py # Main prediction pipeline
├── features.py # Feature engineering
├── models.py # ML model training
├── simulate.py # Monte Carlo simulation
├── roster.py # Entry list inference
├── backtest.py # Historical evaluation
└── data/ # API clients
├── jolpica.py
├── open_meteo.py
├── openf1.py
└── fastf1_backend.py
- Python 3.11+
- See
requirements.txtfor dependencies
Predictions seem random or uniform? Clear the cache and re-run:
rm -rf .cache/
python main.py --round nextMissing actuals for sprint qualifying?
Enable OpenF1 and/or install FastF1 in config.yaml.
Rate limiting errors?
The built-in cache and retry logic should handle most cases. Try increasing live_refresh_seconds or clearing the .cache/ directory.
Import errors for LightGBM on macOS?
pip uninstall lightgbm
pip install lightgbm --no-binary lightgbmThe system will fall back to XGBoost or scikit-learn if LightGBM is unavailable.
If you want to verify the code or contribute:
pip install pytest
pytest tests/ -vMIT – see LICENSE