Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:

- name: Install dependencies
run: |
dnf install -y openssl-devel perl-IPC-Cmd
dnf install -y openssl-devel perl-IPC-Cmd openblas-devel
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --no-modify-path
/opt/python/cp312-cp312/bin/pip install maturin

Expand All @@ -25,7 +25,7 @@ jobs:
for pyver in 39 310 311 312 313; do
pybin="/opt/python/cp${pyver}-cp${pyver}/bin/python"
if [ -f "$pybin" ]; then
/opt/python/cp312-cp312/bin/maturin build --release --out dist -i "$pybin" --features extension-module
/opt/python/cp312-cp312/bin/maturin build --release --out dist -i "$pybin" --features extension-module,openblas
fi
done

Expand Down Expand Up @@ -58,7 +58,7 @@ jobs:
run: pip install maturin

- name: Build wheel
run: maturin build --release --out dist --features extension-module
run: maturin build --release --out dist --features extension-module,accelerate

- name: Upload wheels
uses: actions/upload-artifact@v4
Expand Down
26 changes: 24 additions & 2 deletions .github/workflows/rust-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,21 @@ jobs:
- name: Install Rust toolchain
uses: dtolnay/rust-toolchain@stable

- name: Install OpenBLAS (Linux)
if: runner.os == 'Linux'
run: sudo apt-get update && sudo apt-get install -y libopenblas-dev

- name: Run Rust tests
working-directory: rust
run: cargo test --verbose
run: |
if [ "${{ runner.os }}" == "macOS" ]; then
cargo test --verbose --features accelerate
elif [ "${{ runner.os }}" == "Linux" ]; then
cargo test --verbose --features openblas
else
cargo test --verbose
fi
shell: bash

# Build and test with Python on multiple platforms
python-tests:
Expand All @@ -68,10 +80,20 @@ jobs:
# Keep in sync with pyproject.toml [project.dependencies] and [project.optional-dependencies.dev]
run: pip install pytest pytest-xdist numpy pandas scipy

- name: Install OpenBLAS (Linux)
if: runner.os == 'Linux'
run: sudo apt-get update && sudo apt-get install -y libopenblas-dev

- name: Build and install with maturin
run: |
pip install maturin
maturin build --release -o dist
if [ "${{ runner.os }}" == "macOS" ]; then
maturin build --release -o dist --features extension-module,accelerate
elif [ "${{ runner.os }}" == "Linux" ]; then
maturin build --release -o dist --features extension-module,openblas
else
maturin build --release -o dist
fi
echo "=== Built wheels ==="
ls -la dist/ || dir dist
shell: bash
Expand Down
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,23 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added
- **Conditional BLAS linking for Rust backend** — Apple Accelerate on macOS, OpenBLAS on Linux.
Pre-built wheels now use platform-optimized BLAS for matrix-vector and matrix-matrix
operations across all Rust-accelerated code paths (weights, OLS, TROP). Windows continues
using pure Rust (no external dependencies). Improves Rust backend performance at larger scales.
- `rust_backend_info()` diagnostic function in `diff_diff._backend` — reports compile-time
BLAS feature status (blas, accelerate, openblas)

### Fixed
- **Rust SDID backend performance regression at scale** — Frank-Wolfe solver was 3-10x slower than pure Python at 1k+ scale
- Gram-accelerated FW loop for time weights: precomputes A^T@A, reducing per-iteration cost from O(N×T0) to O(T0) (~100x speedup per iteration at 5k scale)
- Allocation-free FW loop for unit weights: 1 GEMV per iteration (was 3), zero heap allocations (was ~8)
- Dispatch based on problem dimensions: Gram path when T0 < N, standard path when T0 >= N
- Rust backend now faster than pure Python at all scales

## [2.4.1] - 2026-02-17

### Added
Expand Down
24 changes: 19 additions & 5 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,15 @@ maturin develop
# Build with release optimizations
maturin develop --release

# Build with platform BLAS (macOS — links Apple Accelerate)
maturin develop --release --features accelerate

# Build with platform BLAS (Linux — requires libopenblas-dev)
maturin develop --release --features openblas

# Build without BLAS (Windows, or explicit pure Rust)
maturin develop --release

# Force pure Python mode (disable Rust backend)
DIFF_DIFF_BACKEND=python pytest

Expand All @@ -50,9 +59,11 @@ DIFF_DIFF_BACKEND=rust pytest
pytest tests/test_rust_backend.py -v
```

**Note**: As of v2.2.0, the Rust backend uses the pure-Rust `faer` library for linear algebra,
eliminating external BLAS/LAPACK dependencies. This enables Windows wheel builds and simplifies
cross-platform compilation - no OpenBLAS or Intel MKL installation required.
**Note**: As of v2.2.0, the Rust backend uses `faer` (pure Rust) for SVD and matrix inversion.
BLAS is optionally linked via Cargo features (`accelerate` on macOS, `openblas` on Linux)
for matrix-vector/matrix-matrix products. Windows builds remain fully pure Rust with no
external dependencies. Pre-built PyPI wheels include platform BLAS; source builds use
pure Rust by default.

## Architecture

Expand Down Expand Up @@ -183,6 +194,7 @@ cross-platform compilation - no OpenBLAS or Intel MKL installation required.
- Detects optional Rust backend availability
- Handles `DIFF_DIFF_BACKEND` environment variable ('auto', 'python', 'rust')
- Exports `HAS_RUST_BACKEND` flag and Rust function references
- `rust_backend_info()` — returns compile-time BLAS feature status dict
- Other modules import from here to avoid circular imports with `__init__.py`

- **`rust/`** - Optional Rust backend for accelerated computation (v2.0.0+):
Expand All @@ -194,8 +206,10 @@ cross-platform compilation - no OpenBLAS or Intel MKL installation required.
- `compute_unit_distance_matrix()` - Parallel pairwise RMSE distance computation (4-8x speedup)
- `loocv_grid_search()` - Parallel LOOCV across tuning parameters (10-50x speedup)
- `bootstrap_trop_variance()` - Parallel bootstrap variance estimation (5-15x speedup)
- Uses pure-Rust `faer` library for linear algebra (no external BLAS/LAPACK dependencies)
- Cross-platform: builds on Linux, macOS, and Windows without additional setup
- Uses pure-Rust `faer` library for SVD/matrix inversion (no external deps)
- Optional BLAS linking via Cargo features: `accelerate` (macOS), `openblas` (Linux)
- When BLAS is enabled, ndarray `.dot()` calls dispatch to platform-optimized dgemv/dgemm
- Cross-platform: Windows builds use pure Rust with no additional setup
- Provides 4-8x speedup for SyntheticDiD, 5-20x speedup for TROP

- **`diff_diff/results.py`** - Dataclass containers for estimation results:
Expand Down
21 changes: 21 additions & 0 deletions diff_diff/_backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@
compute_time_weights as _rust_compute_time_weights,
compute_noise_level as _rust_compute_noise_level,
sc_weight_fw as _rust_sc_weight_fw,
# Diagnostics
rust_backend_info as _rust_backend_info,
)
_rust_available = True
except ImportError:
Expand All @@ -56,6 +58,7 @@
_rust_compute_time_weights = None
_rust_compute_noise_level = None
_rust_sc_weight_fw = None
_rust_backend_info = None

# Determine final backend based on environment variable and availability
if _backend_env == 'python':
Expand All @@ -78,6 +81,7 @@
_rust_compute_time_weights = None
_rust_compute_noise_level = None
_rust_sc_weight_fw = None
_rust_backend_info = None
elif _backend_env == 'rust':
# Force Rust mode - fail if not available
if not _rust_available:
Expand All @@ -90,8 +94,25 @@
# Auto mode - use Rust if available
HAS_RUST_BACKEND = _rust_available


def rust_backend_info():
"""Return compile-time BLAS feature information for the Rust backend.

Returns a dict with keys:
- 'blas': True if any BLAS backend is linked
- 'accelerate': True if Apple Accelerate is linked (macOS)
- 'openblas': True if OpenBLAS is linked (Linux)

If the Rust backend is not available, all values are False.
"""
if _rust_backend_info is not None:
return _rust_backend_info()
return {"blas": False, "accelerate": False, "openblas": False}


__all__ = [
'HAS_RUST_BACKEND',
'rust_backend_info',
'_rust_bootstrap_weights',
'_rust_synthetic_weights',
'_rust_project_simplex',
Expand Down
14 changes: 10 additions & 4 deletions docs/benchmarks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,11 @@ implementations:
additional speedup since these estimators primarily use OLS and variance
computations that are already highly optimized in NumPy/SciPy via BLAS/LAPACK.

As of v2.5.0, pre-built wheels on macOS and Linux link platform-optimized
BLAS libraries (Apple Accelerate and OpenBLAS respectively) for matrix-vector
and matrix-matrix products across all Rust-accelerated code paths. Windows
wheels continue to use pure Rust with no external dependencies.

Three-Way Performance Summary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -394,10 +399,11 @@ Three-Way Performance Summary
Frank-Wolfe optimization algorithm. At 5k scale, R takes ~9 minutes while
pure Python completes in 32 seconds. ATT estimates are numerically identical
(< 1e-10 difference) since both implementations use the same Frank-Wolfe
optimizer with two-pass sparsification. The Rust backend provides a speedup
at small scale (2.1x over pure Python) but is slower at larger scales due to
overhead in the placebo variance estimation loop; this is a known area for
future optimization.
optimizer with two-pass sparsification. The Rust backend uses a
Gram-accelerated Frank-Wolfe solver for time weights (reducing per-iteration
cost from O(N×T0) to O(T0)) and an allocation-free solver for unit weights
(1 GEMV per iteration instead of 3, zero heap allocations). These
optimizations make the Rust backend faster than pure Python at all scales.

Dataset Sizes
~~~~~~~~~~~~~
Expand Down
12 changes: 10 additions & 2 deletions rust/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ crate-type = ["cdylib", "rlib"]
default = []
# extension-module is only needed for cdylib builds, not for cargo test
extension-module = ["pyo3/extension-module"]
# Platform BLAS backends (optional, activated for pre-built wheels)
# When enabled, ndarray's .dot() and general_mat_vec_mul dispatch to BLAS dgemv/dgemm
accelerate = ["ndarray/blas", "dep:blas-src", "blas-src/accelerate"]
openblas = ["ndarray/blas"]

[dependencies]
# PyO3 0.22 supports Python 3.8-3.13
Expand All @@ -24,10 +28,14 @@ rand = "0.8"
rand_xoshiro = "0.6"
rayon = "1.8"

# Pure Rust linear algebra library - no external BLAS/LAPACK dependencies
# This enables Windows builds without Intel MKL complexity
# Pure Rust linear algebra for SVD/matrix inversion (no external deps).
# BLAS for matrix-vector products is optional via accelerate/openblas features.
faer = "0.24"

# BLAS backend (optional, activated by accelerate/openblas features)
# blas-src 0.10 is ndarray's tested version (see ndarray/crates/blas-tests/Cargo.toml)
blas-src = { version = "0.10", optional = true }

[profile.release]
lto = true
codegen-units = 1
Expand Down
12 changes: 12 additions & 0 deletions rust/build.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
/// Build script for diff_diff_rust.
///
/// When the `openblas` feature is enabled, links against the system OpenBLAS
/// library directly. This avoids the `openblas-src` -> `openblas-build` ->
/// `ureq` -> `native-tls` dependency chain, which has Rust compiler
/// compatibility issues. Requires `libopenblas-dev` (Ubuntu) or
/// `openblas-devel` (CentOS/manylinux) to be installed.
fn main() {
if std::env::var("CARGO_FEATURE_OPENBLAS").is_ok() {
println!("cargo:rustc-link-lib=openblas");
}
}
25 changes: 25 additions & 0 deletions rust/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,16 @@
//! This module provides optimized implementations of computationally
//! intensive operations used in difference-in-differences analysis.

// Pull in BLAS linker flags for macOS Accelerate.
// blas-src is a linker-only crate — extern crate is required to ensure
// the Accelerate framework is actually linked.
// For OpenBLAS (Linux), linking is handled by build.rs instead of blas-src
// to avoid the openblas-src -> ureq -> native-tls dependency chain.
#[cfg(feature = "accelerate")]
extern crate blas_src;

use pyo3::prelude::*;
use std::collections::HashMap;

mod bootstrap;
mod linalg;
Expand Down Expand Up @@ -42,8 +51,24 @@ fn _rust_backend(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_function(wrap_pyfunction!(trop::loocv_grid_search_joint, m)?)?;
m.add_function(wrap_pyfunction!(trop::bootstrap_trop_variance_joint, m)?)?;

// Diagnostics
m.add_function(wrap_pyfunction!(rust_backend_info, m)?)?;

// Version info
m.add("__version__", env!("CARGO_PKG_VERSION"))?;

Ok(())
}

/// Return compile-time BLAS feature information for diagnostics.
#[pyfunction]
fn rust_backend_info() -> PyResult<HashMap<String, bool>> {
let mut info = HashMap::new();
info.insert(
"blas".to_string(),
cfg!(feature = "accelerate") || cfg!(feature = "openblas"),
);
info.insert("accelerate".to_string(), cfg!(feature = "accelerate"));
info.insert("openblas".to_string(), cfg!(feature = "openblas"));
Ok(info)
}
Loading