Skip to content

SDID methodology review: rewrite to match R synthdid + Rust parallel variance#145

Merged
igerber merged 9 commits intomainfrom
sdid-method-review
Feb 11, 2026
Merged

SDID methodology review: rewrite to match R synthdid + Rust parallel variance#145
igerber merged 9 commits intomainfrom
sdid-method-review

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Feb 10, 2026

Summary

  • Methodology rewrite: Complete rewrite of Synthetic DiD to match R's synthdid package (Arkhangelsky et al. 2021). Frank-Wolfe solver, two-pass sparsification, auto-computed regularization, and proper ATT formula all match R's reference implementation.
  • Rust parallel variance: New rust/src/sdid_variance.rs parallelizes placebo SE (~8x speedup) and bootstrap SE (~6x speedup) via rayon, following the established trop.rs pattern. Placebo SE — the default and slowest variance method — drops from ~240s to ~30s on benchmark data.
  • Comprehensive testing: 41 new methodology tests, 9 new Rust variance tests, plus all 31 existing SDID estimator tests pass in both Rust and pure Python modes.

Changes

Methodology (Python)

  • diff_diff/synthetic_did.py — Rewritten SDID estimator with Frank-Wolfe weights, two-pass sparsification, R-matched ATT formula
  • diff_diff/utils.py — New weight computation functions matching R's sc.weight.fw(), noise level estimation, SDID estimator
  • diff_diff/results.py — Updated SyntheticDiDResults for new weight/diagnostic fields

Rust Acceleration

  • rust/src/sdid_variance.rsNEW: Parallel placebo and bootstrap variance estimation
  • rust/src/weights.rs — Frank-Wolfe solver in Rust, pub(crate) visibility for internal functions, fixed stale PyO3 defaults
  • rust/src/lib.rs — Register new sdid_variance module
  • diff_diff/_backend.py — Import/fallback for new Rust functions

Tests

  • tests/test_methodology_sdid.pyNEW: 41 methodology tests (solver convergence, sparsification, regularization, edge cases)
  • tests/test_rust_backend.py — 9 new tests for Rust variance backend (reproducibility, statistical equivalence, edge cases)
  • tests/test_estimators.py — Updated SDID estimator tests
  • tests/test_utils.py — Updated utility tests

Documentation

  • METHODOLOGY_REVIEW.md — SDID review status and findings
  • docs/methodology/REGISTRY.md — Updated SDID methodology entry
  • README.md — Updated SDID feature description
  • CLAUDE.md — Added SDID variance to Rust backend section
  • docs/tutorials/03_synthetic_did.ipynb — Updated tutorial
  • benchmarks/ — Updated R and Python benchmark scripts

Test plan

  • maturin develop && pytest tests/test_rust_backend.py::TestSDIDVarianceRustBackend -v — 9/9 new Rust variance tests pass
  • pytest tests/test_methodology_sdid.py -v — 41/41 methodology tests pass
  • DIFF_DIFF_BACKEND=rust pytest tests/test_estimators.py::TestSyntheticDiD -v — 31/31 with Rust backend
  • DIFF_DIFF_BACKEND=python pytest tests/test_estimators.py::TestSyntheticDiD -v — 31/31 pure Python fallback
  • pytest tests/test_rust_backend.py::TestSDIDRustBackend -v — 12/12 existing SDID Rust tests
  • pytest tests/test_utils.py -v — 73/73 utility tests

🤖 Generated with Claude Code

…variance

Comprehensive review and rewrite of the Synthetic DiD implementation to match
R's synthdid package behavior (Arkhangelsky et al. 2021):

Methodology (Python):
- Frank-Wolfe solver matching R's sc.weight.fw() for unit and time weights
- Two-pass sparsification: 100 iters → sparsify → 10000 iters (matching R)
- Auto-computed regularization from data noise level (zeta_omega, zeta_lambda)
- Bootstrap SE uses fixed weights (matching R's bootstrap_sample)
- Placebo SE re-estimates weights per permutation (matching R's Algorithm 4)
- ATT estimator matches R's synthdid_estimate formula

Rust acceleration:
- New rust/src/sdid_variance.rs: parallel placebo and bootstrap SE via rayon
- placebo_variance_sdid(): ~8x speedup (200 permutations in parallel)
- bootstrap_variance_sdid(): ~6x speedup (200 bootstrap iterations in parallel)
- Frank-Wolfe solver in Rust (weights.rs) for unit and time weight computation
- Fix stale PyO3 defaults on sc_weight_fw and compute_sdid_unit_weights
- Make internal weight functions pub(crate) for cross-module access

Testing:
- 41 new methodology tests (test_methodology_sdid.py)
- 9 new Rust variance backend tests (test_rust_backend.py)
- Updated estimator and utility tests for new API
- All 31 SDID estimator tests pass in both Rust and pure Python modes

Documentation:
- Updated METHODOLOGY_REVIEW.md, REGISTRY.md, README.md, CLAUDE.md
- Updated tutorial notebook and benchmark scripts

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall assessment: ⚠️ Needs changes

Executive Summary

  • SDID placebo variance re-estimates omega/lambda each replication, which conflicts with the Methodology Registry’s fixed-weight placebo algorithm and will change SEs/p-values.
  • Convergence defaults (min_decrease and second-pass iterations) differ from the Registry, so the implementation no longer matches the documented method.
  • _compute_noise_level_numpy can return NaN for small panels (e.g., 1 control, 2 pre-periods), causing NaN regularization and Rust/Python divergence.
  • SyntheticDiD.__init__ now swallows unknown kwargs silently, which can hide configuration errors.

Methodology

  • P1 Placebo variance re-estimates both omega and lambda on each permutation, but the Methodology Registry specifies renormalizing the original omega and keeping lambda fixed. Impact: placebo SEs/p-values no longer match the documented algorithm (and may diverge from the intended R-matching behavior). Fix: either change placebo variance to use _sum_normalize on original unit_weights and fixed time_weights, or update the Registry/METHODOLOGY_REVIEW + tests to reflect re-estimation with supporting citation. diff_diff/synthetic_did.py:L816-L843, rust/src/sdid_variance.rs:L186-L206, docs/methodology/REGISTRY.md:L629-L635
  • P1 Convergence/iteration defaults don’t match the Registry: code uses min_decrease = 1e-5 * noise_level and max_iter=10000, while the Registry states min_decrease=1e-3 and second-pass 1000 iterations. Impact: solver behavior differs from the documented method and can change weights/ATT. Fix: align Registry to the actual intended defaults (if these are correct per source) or adjust code defaults to match the Registry. diff_diff/synthetic_did.py:L305-L306, diff_diff/utils.py:L1384-L1494, docs/methodology/REGISTRY.md:L625-L668
  • P1 _compute_noise_level_numpy returns NaN when total first-diffs ≤ 1 (e.g., 1 control, 2 pre-periods), while Rust returns 0.0; this yields NaN regularization and unstable weights/inference in Python only. Impact: NaN/Inf propagation and backend inconsistency for small panels. Fix: guard first_diffs.size <= 1 (or total diffs <= 1) and return 0.0 (or explicitly match R’s behavior in both backends), plus add a regression test. diff_diff/utils.py:L1192-L1201

Code Quality

  • P2 SyntheticDiD.__init__ accepts **kwargs but never uses/validates them, silently ignoring typos or unsupported params. Impact: misconfiguration without error. Fix: remove **kwargs, validate and raise on unknown keys, or explicitly pass supported extras to super().__init__. diff_diff/synthetic_did.py:L132-L167

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P2 Tests don’t pin the placebo-weighting behavior (fixed vs re-estimated weights) or the single-control noise-level edge case, so methodology drift and NaN propagation can slip through. Impact: regressions in variance methodology or backend parity may go unnoticed. Fix: add tests asserting the chosen placebo weighting strategy and a noise-level edge case (1 control, 2 pre-periods) in tests/test_methodology_sdid.py. tests/test_methodology_sdid.py:L40-L120, tests/test_methodology_sdid.py:L430-L519

…edback

- Wrap Unicode formula in Rust doc comment in ```text fence (fixes doctest CI)
- Guard noise_level for single first-diff element (size<=1 → 0.0, not NaN)
- Remove **kwargs from SyntheticDiD.__init__ (reject unknown params)
- Update Registry: placebo re-estimates weights (not fixed), convergence=1e-5
- Add tests: noise_level edge cases, placebo re-estimation pin, kwargs rejection

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 10, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7158bf0f39f2671d32a62e5e8f6be9a2157062ec


Overall assessment: ⛔ Blocker

Executive summary

  • P0: SDID still classifies treated vs control using the first treatment value per unit, with no enforcement of a common adoption time; this can misclassify staggered-treated units as controls and bias ATT.
  • P1: Required assumption checks from the Methodology Registry (balanced panel, poor pre‑fit warning) are missing in the rewritten estimator.
  • P1: Methodology Registry and implementation disagree on Frank‑Wolfe second‑pass iteration count (1000 vs 10000), creating documentation drift.
  • P2: Placebo variance docstring claims original‑weight initialization, but the implementation doesn’t pass init weights.

Methodology

  • P0 – Control group contamination from treatment timing assumptions
    Impact: If treatment varies within unit or adoption is staggered, groupby(...).first() can label later‑treated units as controls, violating the “block treatment” requirement and biasing ATT and variance.
    Concrete fix: Validate treatment is constant within unit and that all treated units share the same first treated period; ensure post_periods are ≥ that adoption time. If not, raise an error (or route to staggered estimators).
    Refs: diff_diff/synthetic_did.py:L267-L271, docs/methodology/REGISTRY.md:L564-L567

  • P1 – Balanced panel assumption not enforced
    Impact: Missing unit‑periods will introduce NaNs in the pivoted matrices and lead to invalid weights/estimates without warning.
    Concrete fix: Before _create_outcome_matrices, check data.groupby(unit)[time].nunique() == len(all_periods) and error or drop incomplete units with an explicit warning.
    Refs: diff_diff/synthetic_did.py:L285-L291, docs/methodology/REGISTRY.md:L564-L565

  • P1 – Pre‑treatment fit warning required by registry is missing
    Impact: Users aren’t alerted when pre‑fit is poor, which is a key diagnostic for SDID validity.
    Concrete fix: Add a warning when pre_fit_rmse exceeds a threshold (e.g., > 1× SD of treated pre outcomes or configurable), and include the value in the message.
    Refs: diff_diff/synthetic_did.py:L337-L339, docs/methodology/REGISTRY.md:L566


Code Quality

  • P2 – Docstring claims original‑weight initialization that isn’t implemented
    Impact: Misleads users and reviewers; if R relies on init weights, behavior may diverge.
    Concrete fix: Either pass init weights into compute_sdid_unit_weights/compute_time_weights (extend API) or remove the claim from the docstring/comments.
    Refs: diff_diff/synthetic_did.py:L815-L824, diff_diff/utils.py:L1465-L1494

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P1 – Methodology Registry vs implementation mismatch on FW iterations
    Impact: Reproducibility/documentation drift (registry says second pass is 1000 iterations; code defaults to 10000).
    Concrete fix: Either change defaults to 1000 or update the registry checklist and any related docs/benchmarks to 10000 consistently.
    Refs: docs/methodology/REGISTRY.md:L587-L590, diff_diff/utils.py:L1390-L1416, diff_diff/utils.py:L1471-L1494

…round 2

Address 5 issues from PR #145 AI review:
- P0: Validate treatment is constant within unit (reject staggered designs)
- P1: Enforce balanced panel (all units must have all periods)
- P1: Warn when pre-treatment fit RMSE exceeds treated outcome SD
- P1: Fix Registry FW iteration count (1000 → 10000, matching R/code)
- P2: Fix misleading placebo docstring (weights use fresh start, not warm start)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 10, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: c62d3bcc6b78aae0a7bbc4976745fa162a1a6a4a


Overall Assessment: ⚠️ Needs changes

Executive Summary

  • P1: min_decrease deviates from the Methodology Registry when noise_level == 0, so the Frank‑Wolfe convergence rule no longer matches the documented SDID procedure in zero‑noise/short‑pre panels.
  • P1: compute_time_weights now depends on post‑period data but lacks a guard for empty post sets, which can yield NaN weights in Python and different behavior in Rust.
  • The SDID rewrite largely aligns with the Registry (Frank‑Wolfe + sparsification, auto‑regularization, placebo/boot variance), and inference NaN‑propagation is improved.

Methodology

  • P1 — Impact: Convergence criterion deviates from the Registry when noise_level == 0. The Registry specifies min_decrease = 1e-5 * noise_level (so zero when noise is zero). The implementation forces min_decrease = 1e-5 in that case, which changes stopping behavior and can alter weights on degenerate or short‑pre panels. Concrete fix: Use min_decrease = 1e-5 * noise_level unconditionally (allowing 0), or explicitly document/justify this deviation in the Registry and tests.
    Location: diff_diff/synthetic_did.py (SyntheticDiD.fit; min_decrease computation).

Code Quality

  • P1 — Impact: Empty post‑periods (or filters yielding no post data) lead to np.mean on an empty slice in Python compute_time_weights, producing NaNs; Rust uses 0.0 for n_post == 0, so backend behavior diverges. This is a new failure mode because time‑weights now require post data. Concrete fix: Validate post_periods is non‑empty and Y_post_control.shape[0] > 0 in SyntheticDiD.fit() (raise ValueError), and/or add a consistent guard in compute_time_weights (both Python and Rust) for n_post == 0.
    Location: diff_diff/utils.py (compute_time_weights), diff_diff/synthetic_did.py (SyntheticDiD.fit).

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • No issues found.

…trol

Address AI review round 3: (1) Document the 1e-5 min_decrease floor when
noise_level == 0 as an intentional deviation from R in REGISTRY.md and
inline comment — enables early stopping with equivalent results.
(2) Add ValueError guard in compute_time_weights for empty Y_post_control
before both Python and Rust dispatch paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 10, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ced46f265aaada378e8b925f61a6648399d1cfea


Overall assessment: ⚠️ Needs changes

Executive summary

  • SyntheticDiD rewrite and Rust variance path largely align with the Methodology Registry (Frank‑Wolfe + sparsify, auto‑regularization, placebo SE formula).
  • One SDID‑related utility still uses legacy weights and fixed time weights, which conflicts with the new SDID methodology and registry.
  • Minor doc mismatch in placebo variance description could mislead future maintenance.

Methodology

  • Severity: P1 | Impact: compute_placebo_effects is documented as SDID inference but still recomputes unit weights via legacy projected‑gradient compute_synthetic_weights and keeps time weights fixed, which conflicts with the new Frank‑Wolfe SDID methodology and the registry’s placebo procedure (re‑estimate omega + lambda). Users relying on this utility will get placebo effects inconsistent with the estimator/paper. | Fix: Update to use compute_sdid_unit_weights and compute_time_weights (re‑estimate lambda per placebo, matching Algorithm 4), or deprecate/rename the function to make the legacy behavior explicit. diff_diff/utils.py:L1613-L1685

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • Severity: P3 | Impact: _placebo_variance_se docstring states weights are re‑estimated “using original weights as initialization,” but the implementation re‑starts from uniform, which can confuse reviewers and future changes. | Fix: Align the docstring with actual behavior or pass original weights as explicit initialization. diff_diff/synthetic_did.py:L740-L742

…eview round 4

Deprecate the legacy compute_placebo_effects utility (uses projected-gradient
weights with fixed time weights, not matching SDID Algorithm 4). Fix
_placebo_variance_se docstring to say "uniform initialization" instead of
"original weights as initialization", matching the actual implementation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 10, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 36ca9a97587baa11abdd560987dfe40a0baf0d2b


Overall assessment: ⚠️ Needs changes

Executive Summary

  • P2: Bootstrap SE path divides by zero when n_bootstrap=0; add validation/guard.
  • SDID methodology rewrite (Frank-Wolfe weights, auto-regularization, placebo re-estimation) aligns with docs/methodology/REGISTRY.md and Arkhangelsky et al. (2021); no P0/P1 mismatches found.
  • Docs/tests appear updated for new defaults and behavior.

Methodology

  • No issues found; SDID changes align with docs/methodology/REGISTRY.md and in-code references.

Code Quality

  • P2 — Bootstrap SE divides by zero when n_bootstrap=0. Impact: SyntheticDiD(variance_method="bootstrap", n_bootstrap=0) raises ZeroDivisionError instead of a clear validation error. Fix: validate n_bootstrap >= 2 (or at least > 0) when variance_method="bootstrap" or guard before computing failure_rate. diff_diff/synthetic_did.py:L590-L603, diff_diff/synthetic_did.py:L679-L682.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • No issues found.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3e89a1f75ca24969b7833debafc7b9ba263c621f


Overall assessment: ⚠️ Needs changes

Executive summary

  • Methodology rewrite aligns with the Registry (Frank–Wolfe weights, auto-regularization, placebo re-estimation), and the new tests are extensive.
  • Python variance paths do not filter non‑finite placebo/bootstrap estimates while Rust does, which can create backend divergence and unstable SEs.
  • Inference gating in SyntheticDiD.fit only checks se > 0, so se=inf can yield misleading t/p/CI; use np.isfinite(se).

Methodology

  • P1 — Backend parity + non‑finite inference leakage. Impact: Python placebo/bootstrap can retain NaN/Inf tau values, while Rust filters them. This can lead to NaN/Inf SEs in Python but finite SEs in Rust for the same data, violating backend equivalence and the NaN/Inf propagation guidance. Fix: drop non‑finite tau before appending in both Python loops (bootstrap + placebo) and compute n_successful on the filtered set, mirroring Rust’s tau.is_finite() guard. diff_diff/synthetic_did.py:L646-L679, diff_diff/synthetic_did.py:L855-L903
  • P2 — SE finiteness check is incomplete. Impact: if se is inf, the code computes t_stat ≈ 0 and a p‑value, which is misleading per the edge‑case checklist. Fix: gate inference on np.isfinite(se) and se > 0 and set t/p/CI to NaN otherwise. diff_diff/synthetic_did.py:L426-L444

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P3 — Missing regression test for backend parity under non‑finite estimates. Impact: the Rust/Python mismatch above could regress silently. Fix: add a test that injects NaN/Inf into a bootstrap/placebo iteration and asserts both backends drop non‑finite tau and compute SE from finite reps only.

…te SE

- Add np.isfinite(tau) guard in Python bootstrap and placebo loops to match
  Rust backend's tau.is_finite() filtering (sdid_variance.rs:220,391)
- Change `se > 0` to `np.isfinite(se) and se > 0` so se=inf produces NaN
  inference fields instead of misleading t_stat≈0 and spurious p-values
- Add regression tests for non-finite tau filtering and inf SE gating

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ee5a1e46be84101b5f4b1c1e75f7510f3478e93f


Overall Assessment

  • ⚠️ Needs changes

Executive Summary

  • SyntheticDiD methodology changes align with the Methodology Registry and in-code references; no estimator-level mismatches found.
  • Edge-case handling for NaN/Inf inference and balance/treatment validation looks consistent with the registry requirements.
  • Documentation is now out of sync with the deprecated params and new n_bootstrap validation.

Methodology

  • No issues found; cross-checked against docs/methodology/REGISTRY.md:L604-L679 and implementation in diff_diff/synthetic_did.py:L333-L444 and diff_diff/utils.py:L1204-L1468.

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • No issues found.

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P2 | Impact: Troubleshooting guidance still recommends deprecated lambda_reg and n_bootstrap=0 (now raises ValueError), which will mislead users after this PR. | Fix: Update examples to zeta_omega/zeta_lambda, remove n_bootstrap=0, and reference variance_method="placebo" with n_bootstrap>=2. (docs/troubleshooting.rst:L86-L109)
  • P2 | Impact: API autosummary still lists removed SyntheticDiDResults.lambda_reg, and omits new diagnostics fields. | Fix: Regenerate autosummary or update the list to include noise_level, zeta_omega, and zeta_lambda. (docs/api/_autosummary/diff_diff.SyntheticDiDResults.rst:L30-L35)

…ap=0

Troubleshooting docs now use zeta_omega/zeta_lambda instead of deprecated
lambda_reg, and variance_method="placebo" instead of invalid n_bootstrap=0.
API autosummary removes lambda_reg and adds missing diagnostic fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5da0999dca0553d777bd60b8b2c420e27918a8c3


Overall assessment: ⚠️ Needs changes

Executive Summary

  • SDID rewrite appears consistent with the Methodology Registry (FW weights, auto-regularization, placebo/boot variance) and includes stronger panel/treatment validation.
  • Edge-case handling for SE=0/inf now propagates NaNs to inference fields as required.
  • P1: new regularization overrides aren’t exercised in the bootstrap inference path, leaving a parameter‑interaction gap.
  • Tests not run in this review.

Methodology

  • No issues found.

Code Quality

  • No issues found.

Performance

  • No issues found.

Maintainability

  • P3 — _placebo_variance_se still accepts/documents unit_weights and time_weights even though the new placebo logic re‑estimates weights and never uses those inputs; this is now misleading and easy to misuse in future refactors. Impact: confusion/incorrect assumptions by callers. Fix: remove these params or explicitly mark them unused in the docstring and adjust call sites. diff_diff/synthetic_did.py:L726-L769

Tech Debt

  • No issues found.

Security

  • No issues found.

Documentation/Tests

  • P1 — New zeta_omega/zeta_lambda overrides are only tested under the placebo path; bootstrap inference tests use defaults only. Impact: a regression where overrides are ignored in bootstrap inference would go undetected. Fix: add a bootstrap-path test that sets zeta_omega/zeta_lambda and asserts they propagate (and/or affect weights) under variance_method="bootstrap". tests/test_methodology_sdid.py:L394-L409, tests/test_methodology_sdid.py:L449-L463

- Add test_bootstrap_with_zeta_overrides to cover bootstrap SE path
  with user-specified zeta_omega/zeta_lambda overrides (P1 coverage gap)
- Remove unused unit_weights/time_weights params from _placebo_variance_se
  signature, docstring, and call site (P3 cleanup)
- Update existing test that called _placebo_variance_se directly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e481aed1bcd377a2c8558aa3c4e4b0f0a3f8a670


Overall assessment: ✅ Looks good

Executive Summary

  • SyntheticDiD rewrite (Frank‑Wolfe weights, auto‑regularization, placebo/bootstrp variance, default variance_method) aligns with the Methodology Registry and in‑code references.
  • Edge‑case handling for block treatment, balanced panel checks, and NaN/Inf inference is consistent with the Registry’s requirements.
  • Rust variance acceleration appears consistent with Python fallback; no divergences spotted in the changed paths.

Methodology

  • P3 | Impact: N/A | Fix: N/A — Cross‑check of SDID estimator/weights/variance vs Registry/docstrings shows no mismatches. diff_diff/synthetic_did.py:350 diff_diff/synthetic_did.py:724 diff_diff/utils.py:1154 docs/methodology/REGISTRY.md:560

Code Quality

  • P3 | Impact: N/A | Fix: N/A — No issues found.

Performance

  • P3 | Impact: N/A | Fix: N/A — No performance regressions identified; Rust parallel variance is additive.

Maintainability

  • P3 | Impact: N/A | Fix: N/A — No maintainability issues found.

Tech Debt

  • P3 | Impact: N/A | Fix: N/A — No new tech debt identified.

Security

  • P3 | Impact: N/A | Fix: N/A — No secrets or security concerns found in diff.

Documentation/Tests

  • P3 | Impact: N/A | Fix: N/A — SDID docs/tests updated; no gaps found.

@igerber igerber merged commit 91f23c4 into main Feb 11, 2026
7 checks passed
@igerber igerber deleted the sdid-method-review branch February 11, 2026 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant