Skip to content

Remove Rust outer-loop SDID variance to fix SE mismatch and perf regression#147

Merged
igerber merged 1 commit intomainfrom
sdid-benchmarks
Feb 15, 2026
Merged

Remove Rust outer-loop SDID variance to fix SE mismatch and perf regression#147
igerber merged 1 commit intomainfrom
sdid-benchmarks

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Feb 15, 2026

Summary

  • Remove Rust parallel placebo/bootstrap variance estimation outer loops from synthetic_did.py
  • Delete rust/src/sdid_variance.rs (498 lines) and all associated exports/registrations
  • Keep Rust-accelerated inner Frank-Wolfe weight computation (18x faster than R)
  • Add backend SE consistency tests confirming Rust and Python backends produce identical SEs
  • Delete TestSDIDVarianceRustBackend test class (9 tests that directly imported deleted Rust functions)

Root causes fixed

  1. SE mismatch: Rust used Xoshiro256PlusPlus RNG producing different permutation sequences than Python's default_rng, causing SE divergence between backends (Python=0.1048, Rust=0.0987 at small scale)
  2. Performance regression: Rayon par_iter across all replications saturated memory bandwidth at 1k+ scale (3.2x slower at 1k, 9.7x slower at 5k vs pure Python)

Architecture after fix

Python sequential loop is the only orchestration path for variance estimation. When Rust is available, inner Frank-Wolfe weight calls dispatch to Rust via utils.py_backend.py_rust_backend. The DIFF_DIFF_BACKEND env var controls this cleanly.

SE convergence validation (Python vs R at increasing iteration counts)

n_reps Python SE R SE Relative Diff
50 0.104772 0.112034 6.5%
200 0.113138 0.109023 3.8%
1000 0.109822 0.104476 5.1%
2000 0.105956 0.106015 0.1%

Both converge to ~0.106; gap is Monte Carlo noise.

Performance (small scale, 2000 reps)

Backend Time vs R
Python + Rust inner-loop 4.65s 18x faster
R synthdid 81.87s baseline
Python pure 168.89s 2x slower

Methodology references (required if estimator / math changes)

  • Method name(s): Synthetic Difference-in-Differences placebo variance (Algorithm 4)
  • Paper / source link(s): Arkhangelsky et al. (2021). American Economic Review, 111(12), 4088-4118
  • Any intentional deviations from the source (and why): None — this change removes an implementation artifact (parallel Rust outer loop with different RNG) that deviated from the sequential permutation approach

Validation

  • Tests added/updated:
    • tests/test_methodology_sdid.py: Added TestBackendSEConsistency (2 tests: placebo + bootstrap SE matching across backends, rtol=1e-4)
    • tests/test_rust_backend.py: Removed TestSDIDVarianceRustBackend (9 tests for deleted Rust functions)
  • 277 tests pass across test_methodology_sdid.py, test_rust_backend.py, test_estimators.py
  • 60 tests pass in DIFF_DIFF_BACKEND=python mode
  • maturin develop --release builds cleanly
  • grep -r "placebo_variance_sdid\|bootstrap_variance_sdid" returns zero hits

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…d perf regression

The Rust placebo/bootstrap variance paths used a different RNG (Xoshiro256PlusPlus)
producing different permutation sequences than Python, causing SE divergence between
backends. Rayon parallelism across all replications also saturated memory bandwidth
at 1k+ scale (3-10x slower than pure Python). Inner Frank-Wolfe weight calls still
dispatch to Rust for ~18x speedup over R.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment ✅ Looks good

Executive Summary

  • SyntheticDiD variance estimation now runs through the Python outer loop for both backends; the algorithm remains aligned with the Methodology Registry (Algorithm 4 placebo, fixed‑weight bootstrap).
  • Rust SDID variance exports are fully removed from _backend and Rust module registration with no remaining references.
  • New backend SE consistency tests cover the intended behavior change.
  • Minor test‑robustness suggestion noted below.

Methodology

  • No findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3 | Impact: The “pure Python backend” comparison only patches diff_diff.utils.HAS_RUST_BACKEND, so if SDID starts using other Rust‑accelerated modules in the future (e.g., diff_diff.linalg), the test could silently include Rust and no longer validate the intended comparison. | Fix: Force the backend at the source (e.g., set DIFF_DIFF_BACKEND=python and importlib.reload() relevant modules), or patch _backend.HAS_RUST_BACKEND/_rust_* in addition to diff_diff.utils.HAS_RUST_BACKEND. | Location: tests/test_methodology_sdid.py:L1042-L1127

@igerber igerber merged commit a8eadb3 into main Feb 15, 2026
8 checks passed
@igerber igerber deleted the sdid-benchmarks branch February 15, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant