Skip to content

euanmacintyre/mortality-longevity-analysis

Repository files navigation

mortality-longevity-analysis

Actuarial mortality and longevity analysis project using ONS mortality data, life tables, and a transparent Lee-Carter forecasting framework.

Project Overview

This repository implements an end-to-end mortality analytics workflow:

  • Ingest ONS mortality and projection datasets from local Excel files.
  • Normalize data into a consistent long format.
  • Fit a simple Lee-Carter model by sex.
  • Generate baseline and stress-tested longevity scenarios.
  • Build life tables and derive life expectancy metrics (e0, e65).
  • Backtest model performance on holdout years.
  • Compare custom projections to ONS principal/high/low variants.
  • Produce publication-ready tables and figures.

Actuarial Relevance

Mortality forecasting sits at the core of pricing, reserving, and capital for long-duration liabilities. This project is structured to support practical actuarial use cases:

  • Pension scheme funding projections and de-risking analysis.
  • Annuity pricing and profitability sensitivity testing.
  • Longevity trend monitoring and assumption governance.
  • Stress/scenario analysis for solvency and risk committees.

Data Sources (Placeholders)

Update these placeholders with exact ONS publication names, release dates, and links used in your analysis.

  • ONS_DATASET_NAME_OBSERVED_QX_SINGLE_AGE
  • ONS_DATASET_NAME_PRINCIPAL_PROJECTION_QX
  • ONS_DATASET_NAME_HIGH_LIFE_EXPECTANCY_VARIANT_QX
  • ONS_DATASET_NAME_LOW_LIFE_EXPECTANCY_VARIANT_QX

Expected raw files are stored in data/raw/ and mapped in configs/default.yaml.

Methodology

1. Data ingestion and normalization

  • Dataset-specific parsers handle differing ONS sheet names/layouts.
  • All datasets are normalized to:
    • source, scenario, geography, sex, year, age, qx
  • Validation checks:
    • Required columns
    • Age bounds and data type checks
    • qx bounded in [0, 1]

2. Transformations

  • Conversion between probability and central death rate:
    • mx = -log(1 - qx)
    • qx = 1 - exp(-mx)
  • Mortality improvement calculations by age/year.
  • Train/holdout split helpers for model validation.

3. Lee-Carter model

By sex, fit:

  • log(mx_{x,t}) = a_x + b_x * k_t + error_{x,t}

Estimation uses:

  • a_x: mean of log mortality by age.
  • SVD on centered log mortality matrix for b_x, k_t.
  • Identifiability constraints:
    • sum(b_x) = 1
    • sum(k_t) = 0

k_t forecasting:

  • Random walk with drift (default).
  • Optional ARIMA(0,1,0)+drift via statsmodels.

4. Life tables and longevity metrics

From projected qx, complete life tables are built (lx, dx, Lx, Tx, ex) with an explicit closing-age assumption.

Outputs include:

  • Life expectancy at birth (e0)
  • Life expectancy at age 65 (e65)

5. Scenario analysis and backtesting

Implemented scenarios:

  • Baseline
  • Faster mortality improvement
  • Slower mortality improvement
  • Temporary mortality shock

Backtest framework:

  • Fit on a training window.
  • Evaluate on holdout years versus observed mortality and life expectancy.
  • Compare custom projections with ONS principal/high/low variants when available.

Repository Structure

mortality-longevity-analysis/
├── configs/
│   └── default.yaml
├── data/
│   ├── raw/
│   ├── interim/
│   └── processed/
├── outputs/
│   ├── figures/
│   └── tables/
├── reports/
│   └── project_report.md
├── src/mortality_longevity/
│   ├── backtest.py
│   ├── config.py
│   ├── data_download.py
│   ├── data_parse.py
│   ├── lee_carter.py
│   ├── life_table.py
│   ├── plots.py
│   ├── scenarios.py
│   └── transform.py
├── tests/
├── Makefile
└── pyproject.toml

Setup Instructions

Prerequisites

  • Python 3.10+
  • Local ONS Excel files placed in data/raw/

Install

python -m pip install -e ".[dev]"
pre-commit install

Run quality checks

make check

How To Run The Pipeline

1. Ingest and normalize ONS data

python - << 'PY'
from pathlib import Path
from mortality_longevity.data_download import ingest_ons_qx

output_path = ingest_ons_qx(Path("configs/default.yaml"))
print(f"Normalized dataset written to: {output_path}")
PY

2. Generate scenarios and save summary tables

python - << 'PY'
import pandas as pd
from mortality_longevity.transform import qx_to_mx
from mortality_longevity.scenarios import generate_standard_scenarios, save_scenario_summary_tables

normalized = pd.read_csv("data/interim/ons_qx_normalized.csv")
observed = normalized.loc[normalized["scenario"] == "observed", ["sex", "year", "age", "qx"]].copy()
observed["mx"] = qx_to_mx(observed["qx"])

scenario_projection = generate_standard_scenarios(observed[["sex", "year", "age", "mx"]], years_ahead=30)
paths = save_scenario_summary_tables(scenario_projection)
print(paths)
PY

3. Run backtest and write holdout diagnostics

python - << 'PY'
import pandas as pd
from mortality_longevity.transform import qx_to_mx
from mortality_longevity.backtest import run_backtest

normalized = pd.read_csv("data/interim/ons_qx_normalized.csv")
observed = normalized.loc[normalized["scenario"] == "observed", ["sex", "year", "age", "qx"]].copy()
observed["mx"] = qx_to_mx(observed["qx"])

result = run_backtest(observed[["sex", "year", "age", "mx"]], train_end_year=2015)
print(result.saved_tables)
PY

4. (Optional) Compare with ONS variants

python - << 'PY'
import pandas as pd
from mortality_longevity.transform import qx_to_mx
from mortality_longevity.scenarios import generate_standard_scenarios
from mortality_longevity.backtest import compare_custom_projection_with_ons_variants

normalized = pd.read_csv("data/interim/ons_qx_normalized.csv")
observed = normalized.loc[normalized["scenario"] == "observed", ["sex", "year", "age", "qx"]].copy()
observed["mx"] = qx_to_mx(observed["qx"])

custom = generate_standard_scenarios(observed[["sex", "year", "age", "mx"]], years_ahead=30)
variants = normalized.loc[
    normalized["scenario"].isin(["projection_principal", "projection_high_life_expectancy", "projection_low_life_expectancy"])
].copy()

tables = compare_custom_projection_with_ons_variants(
    custom_projection=custom,
    ons_variant_data=variants,
)
print({k: v.shape for k, v in tables.items()})
PY

Key Outputs

  • Normalized input data:
    • data/interim/ons_qx_normalized.csv (or parquet)
  • Summary tables (outputs/tables/):
    • Scenario life expectancy summary
    • Scenario mortality summary
    • Backtest mortality/life expectancy summaries
    • Custom vs ONS comparison summaries
  • Figures (outputs/figures/):
    • Standardized naming format:
      • mortality_longevity_<plot_name>[_qualifiers].png

Limitations

  • Lee-Carter assumes a stable age pattern and smooth period trend.
  • One-factor structure may miss cohort effects and cause-of-death shifts.
  • Extreme-age estimates can be volatile due to sparse exposure.
  • Short or noisy data windows can destabilize k_t drift estimates.
  • Temporary shocks (for example pandemic years) are hard to model with stationary drift assumptions.

Next Steps

  • Add exposure-weighted fitting and diagnostics.
  • Introduce cohort-aware and multi-factor mortality models.
  • Add bootstrap/parameter uncertainty around longevity outputs.
  • Extend backtests with rolling-origin evaluation.
  • Package a single CLI command for full pipeline execution.

About

mortality-longevity-analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors