Actuarial mortality and longevity analysis project using ONS mortality data, life tables, and a transparent Lee-Carter forecasting framework.
This repository implements an end-to-end mortality analytics workflow:
- Ingest ONS mortality and projection datasets from local Excel files.
- Normalize data into a consistent long format.
- Fit a simple Lee-Carter model by sex.
- Generate baseline and stress-tested longevity scenarios.
- Build life tables and derive life expectancy metrics (
e0,e65). - Backtest model performance on holdout years.
- Compare custom projections to ONS principal/high/low variants.
- Produce publication-ready tables and figures.
Mortality forecasting sits at the core of pricing, reserving, and capital for long-duration liabilities. This project is structured to support practical actuarial use cases:
- Pension scheme funding projections and de-risking analysis.
- Annuity pricing and profitability sensitivity testing.
- Longevity trend monitoring and assumption governance.
- Stress/scenario analysis for solvency and risk committees.
Update these placeholders with exact ONS publication names, release dates, and links used in your analysis.
ONS_DATASET_NAME_OBSERVED_QX_SINGLE_AGEONS_DATASET_NAME_PRINCIPAL_PROJECTION_QXONS_DATASET_NAME_HIGH_LIFE_EXPECTANCY_VARIANT_QXONS_DATASET_NAME_LOW_LIFE_EXPECTANCY_VARIANT_QX
Expected raw files are stored in data/raw/ and mapped in configs/default.yaml.
- Dataset-specific parsers handle differing ONS sheet names/layouts.
- All datasets are normalized to:
source, scenario, geography, sex, year, age, qx
- Validation checks:
- Required columns
- Age bounds and data type checks
qxbounded in[0, 1]
- Conversion between probability and central death rate:
mx = -log(1 - qx)qx = 1 - exp(-mx)
- Mortality improvement calculations by age/year.
- Train/holdout split helpers for model validation.
By sex, fit:
log(mx_{x,t}) = a_x + b_x * k_t + error_{x,t}
Estimation uses:
a_x: mean of log mortality by age.- SVD on centered log mortality matrix for
b_x,k_t. - Identifiability constraints:
sum(b_x) = 1sum(k_t) = 0
k_t forecasting:
- Random walk with drift (default).
- Optional
ARIMA(0,1,0)+driftviastatsmodels.
From projected qx, complete life tables are built (lx, dx, Lx, Tx, ex) with an explicit closing-age assumption.
Outputs include:
- Life expectancy at birth (
e0) - Life expectancy at age 65 (
e65)
Implemented scenarios:
- Baseline
- Faster mortality improvement
- Slower mortality improvement
- Temporary mortality shock
Backtest framework:
- Fit on a training window.
- Evaluate on holdout years versus observed mortality and life expectancy.
- Compare custom projections with ONS principal/high/low variants when available.
mortality-longevity-analysis/
├── configs/
│ └── default.yaml
├── data/
│ ├── raw/
│ ├── interim/
│ └── processed/
├── outputs/
│ ├── figures/
│ └── tables/
├── reports/
│ └── project_report.md
├── src/mortality_longevity/
│ ├── backtest.py
│ ├── config.py
│ ├── data_download.py
│ ├── data_parse.py
│ ├── lee_carter.py
│ ├── life_table.py
│ ├── plots.py
│ ├── scenarios.py
│ └── transform.py
├── tests/
├── Makefile
└── pyproject.toml
- Python 3.10+
- Local ONS Excel files placed in
data/raw/
python -m pip install -e ".[dev]"
pre-commit installmake checkpython - << 'PY'
from pathlib import Path
from mortality_longevity.data_download import ingest_ons_qx
output_path = ingest_ons_qx(Path("configs/default.yaml"))
print(f"Normalized dataset written to: {output_path}")
PYpython - << 'PY'
import pandas as pd
from mortality_longevity.transform import qx_to_mx
from mortality_longevity.scenarios import generate_standard_scenarios, save_scenario_summary_tables
normalized = pd.read_csv("data/interim/ons_qx_normalized.csv")
observed = normalized.loc[normalized["scenario"] == "observed", ["sex", "year", "age", "qx"]].copy()
observed["mx"] = qx_to_mx(observed["qx"])
scenario_projection = generate_standard_scenarios(observed[["sex", "year", "age", "mx"]], years_ahead=30)
paths = save_scenario_summary_tables(scenario_projection)
print(paths)
PYpython - << 'PY'
import pandas as pd
from mortality_longevity.transform import qx_to_mx
from mortality_longevity.backtest import run_backtest
normalized = pd.read_csv("data/interim/ons_qx_normalized.csv")
observed = normalized.loc[normalized["scenario"] == "observed", ["sex", "year", "age", "qx"]].copy()
observed["mx"] = qx_to_mx(observed["qx"])
result = run_backtest(observed[["sex", "year", "age", "mx"]], train_end_year=2015)
print(result.saved_tables)
PYpython - << 'PY'
import pandas as pd
from mortality_longevity.transform import qx_to_mx
from mortality_longevity.scenarios import generate_standard_scenarios
from mortality_longevity.backtest import compare_custom_projection_with_ons_variants
normalized = pd.read_csv("data/interim/ons_qx_normalized.csv")
observed = normalized.loc[normalized["scenario"] == "observed", ["sex", "year", "age", "qx"]].copy()
observed["mx"] = qx_to_mx(observed["qx"])
custom = generate_standard_scenarios(observed[["sex", "year", "age", "mx"]], years_ahead=30)
variants = normalized.loc[
normalized["scenario"].isin(["projection_principal", "projection_high_life_expectancy", "projection_low_life_expectancy"])
].copy()
tables = compare_custom_projection_with_ons_variants(
custom_projection=custom,
ons_variant_data=variants,
)
print({k: v.shape for k, v in tables.items()})
PY- Normalized input data:
data/interim/ons_qx_normalized.csv(or parquet)
- Summary tables (
outputs/tables/):- Scenario life expectancy summary
- Scenario mortality summary
- Backtest mortality/life expectancy summaries
- Custom vs ONS comparison summaries
- Figures (
outputs/figures/):- Standardized naming format:
mortality_longevity_<plot_name>[_qualifiers].png
- Standardized naming format:
- Lee-Carter assumes a stable age pattern and smooth period trend.
- One-factor structure may miss cohort effects and cause-of-death shifts.
- Extreme-age estimates can be volatile due to sparse exposure.
- Short or noisy data windows can destabilize
k_tdrift estimates. - Temporary shocks (for example pandemic years) are hard to model with stationary drift assumptions.
- Add exposure-weighted fitting and diagnostics.
- Introduce cohort-aware and multi-factor mortality models.
- Add bootstrap/parameter uncertainty around longevity outputs.
- Extend backtests with rolling-origin evaluation.
- Package a single CLI command for full pipeline execution.