Fix mypy errors, add notebook CI, clean up TODO by igerber · Pull Request #223 · igerber/diff-diff

igerber · 2026-03-21T15:10:47Z

Summary

Resolve all 9 mypy attr-defined errors by adding TYPE_CHECKING-guarded method stubs to 3 bootstrap mixin classes (staggered_bootstrap.py, two_stage_bootstrap.py, imputation_bootstrap.py)
Add GitHub Actions workflow (.github/workflows/notebooks.yml) to execute 15 tutorial notebooks in CI via nbmake, triggered on PR/push to relevant paths and weekly schedule
Add nbmake>=1.5 to dev dependencies
Clean up TODO.md: remove completed/crossed-out items, correct Sphinx warning diagnosis (376 from manual page autodoc, not ~1,460 from autosummary stubs), mark CallawaySantAnna HonestDiD support as done, add C-LF implementation note

Methodology references (required if estimator / math changes)

N/A — changes are type-annotation-only (no methodology or behavioral changes)

Validation

Tests added/updated: No test changes (type annotations are zero-runtime-cost; verified all 1885 tests pass)
mypy diff_diff reports 0 errors (down from 9)
Notebook CI workflow validated locally with DIFF_DIFF_BACKEND=python PYTHONPATH=. pytest --nbmake docs/tutorials/

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

- Add TYPE_CHECKING-guarded method stubs to 3 bootstrap mixin classes, resolving all 9 mypy attr-defined errors (staggered_bootstrap.py, two_stage_bootstrap.py, imputation_bootstrap.py) - Add GitHub Actions workflow to execute 15 tutorial notebooks in CI via nbmake (triggered on PR/push/weekly schedule) - Add nbmake>=1.5 to dev dependencies - Clean up TODO.md: remove completed/crossed-out items, correct Sphinx warning diagnosis (376 from manual pages, not 1460 from stubs), mark CallawaySantAnna HonestDiD support as done, add C-LF implementation note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-03-21T15:18:28Z

Overall Assessment
✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

No methodology defects found. The estimator-side edits are TYPE_CHECKING stubs only and do not change weighting, variance, identification assumptions, bootstrap behavior, or defaults in the touched estimators.
Cross-checking the Methodology Registry against the touched bootstrap mixins shows no mismatch for Callaway-Sant’Anna, ImputationDiD, or TwoStageDiD.
No new edge-case regressions were introduced in the modified estimator files: no new inline inference anti-patterns, NaN/Inf guard regressions, parameter-plumbing gaps, or control-group logic changes.
TODO.md cleanup looks consistent with the codebase; it does not appear to hide unresolved correctness issues.
Two minor P3 items remain around the new notebook workflow: environment drift risk and an overstated claim about catching external dataset URL breakage.

Methodology

No findings. The only estimator changes are if TYPE_CHECKING: method stubs in diff_diff/staggered_bootstrap.py:122, diff_diff/imputation_bootstrap.py:71, and diff_diff/two_stage_bootstrap.py:44. Those do not execute at runtime, and the registry-defined SE/bootstrap behavior remains unchanged for docs/methodology/REGISTRY.md:1791, docs/methodology/REGISTRY.md:1793, and docs/methodology/REGISTRY.md:1794.

Code Quality

No findings. The added stub signatures line up with the concrete helper implementations in diff_diff/staggered_aggregation.py:196, diff_diff/imputation.py:707, diff_diff/imputation.py:907, diff_diff/two_stage.py:1092, and diff_diff/two_stage.py:1274.

Performance

No findings. No estimator hot path changed; the runtime impact is limited to adding a CI notebook job.

Maintainability

Severity: P3. Impact: The new notebook workflow hard-codes a separate environment and runs notebooks from the source tree via PYTHONPATH=., which creates another dependency surface to keep in sync with the declared project install path and the notebook instructions. References: .github/workflows/notebooks.yml:35, .github/workflows/notebooks.yml:41, pyproject.toml:52, docs/tutorials/README.md:56, .github/workflows/rust-test.yml:80. Concrete fix: Install the package itself in this job (pip install -e . plus the minimal notebook/test extras, or pip install -e ".[dev]") and only keep a manual dependency list if it is explicitly marked as a keep-in-sync copy.

Tech Debt

No findings. The TODO cleanup appears to remove resolved items rather than hide deferred correctness work. The updated HonestDiD notes are backed by existing code in diff_diff/honest_did.py:612 and diff_diff/honest_did.py:946.

Security

No findings. No secrets handling, privilege escalation, or unsafe external-action pattern was introduced in the diff.

Documentation/Tests

Severity: P3. Impact: The weekly schedule comment says it “catches external breakage (dataset URLs, etc.)”, but the tutorial dataset loaders fall back to cached or constructed data when downloads fail, so upstream URL breakage may be masked without failing CI. References: .github/workflows/notebooks.yml:19, diff_diff/datasets.py:31, diff_diff/datasets.py:65, diff_diff/datasets.py:218, diff_diff/datasets.py:362. Concrete fix: Either reword the comment to describe a periodic notebook smoke test, or add a separate scheduled dataset-availability smoke test that uses force_download=True and fails on download errors.
Validation note: This review was static only. I did not run mypy or execute the notebook workflow locally because the review sandbox does not have the scientific Python dependencies installed (numpy import failed).

PYTHONPATH=. only affects the shell, not the Jupyter kernel spawned by nbmake. Write a .pth file into site-packages so the kernel can import diff_diff. Also add ipykernel dependency and set DIFF_DIFF_BACKEND via env block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Reword schedule comment: "smoke test" instead of overstating dataset URL breakage detection (loaders have fallbacks) - Add "Keep in sync" comment on manual dep list, matching the convention in rust-test.yml python-fallback job - Note why pip install -e . isn't used (requires Rust toolchain) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber · 2026-03-21T15:47:55Z

/ai-review

github-actions · 2026-03-21T16:13:24Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b731c338420e22eb20c444b44265bfe0d4e26860

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review status: the prior P3 about the scheduled-workflow comment overstating what weekly CI checks is resolved at .github/workflows/notebooks.yml:L18-L20; the prior environment-drift P3 remains informational at .github/workflows/notebooks.yml:L35-L41.
The estimator-side edits in diff_diff/staggered_bootstrap.py:L122-L136, diff_diff/imputation_bootstrap.py:L71-L105, and diff_diff/two_stage_bootstrap.py:L44-L63 are TYPE_CHECKING-only stubs and do not change runtime estimator behavior.
Cross-checking the affected methods against docs/methodology/REGISTRY.md:L311-L323, docs/methodology/REGISTRY.md:L822-L828, and docs/methodology/REGISTRY.md:L875-L893 found no undocumented deviation in weighting, variance/SE, assumptions, or defaults.
No new edge-case regressions were introduced in the modified estimator files: no new inline inference code, partial NaN guards, empty-result handling changes, or control-group logic changes.
TODO.md cleanup does not appear to hide unresolved correctness work; the HonestDiD status updates are consistent with diff_diff/honest_did.py:L612-L653 and diff_diff/honest_did.py:L946-L960.

Methodology

No findings. The affected methods are Callaway-Sant’Anna’s combined IF/bootstrap interface, ImputationDiD’s bootstrap psi and balance_e helpers, and TwoStageDiD’s bootstrap GMM/FE helpers. The new declarations in diff_diff/staggered_bootstrap.py:L122-L136, diff_diff/imputation_bootstrap.py:L71-L105, and diff_diff/two_stage_bootstrap.py:L44-L63 mirror existing implementations in diff_diff/staggered_aggregation.py:L196-L208, diff_diff/imputation.py:L707-L766, diff_diff/imputation.py:L907-L924, diff_diff/two_stage.py:L1092-L1096, and diff_diff/two_stage.py:L1274-L1281.

Code Quality

No findings. The stub signatures line up with the concrete helper methods they describe, and the modified bootstrap files do not introduce the prohibited inline inference pattern or partial NaN-guard pattern.

Performance

No findings. The estimator hot paths are unchanged; the only runtime addition is notebook execution in CI.

Maintainability

P3 Impact: .github/workflows/notebooks.yml:L35-L41 hard-codes a partial dependency set and relies on a handwritten .pth injection, which can drift from the declared dependency surface in pyproject.toml:L46-L63 and make notebook CI exercise a different environment from supported installs. Concrete fix: keep the current “no Rust build” approach if desired, but move notebook-runtime dependencies into a single source of truth such as a dedicated extra or requirements file and install from that, rather than duplicating package names inline in the workflow.

Tech Debt

No findings. The TODO.md cleanup looks like removal of addressed items rather than masking of unresolved correctness debt, and the updated HonestDiD notes are backed by existing code at diff_diff/honest_did.py:L612-L653 and diff_diff/honest_did.py:L946-L960.

Security

No findings. The new workflow uses the standard pull_request trigger, not pull_request_target, and does not add secrets handling or elevated permissions in .github/workflows/notebooks.yml:L3-L18.

Documentation/Tests

No findings. The prior informational concern about overstating scheduled dataset-availability coverage is addressed by rewording the weekly schedule comment to a generic notebook smoke test at .github/workflows/notebooks.yml:L18-L20.

Review basis: static review only. The local environment did not have mypy, numpy, pandas, scipy, pytest, nbmake, or ipykernel, so I did not rerun mypy or execute the notebook workflow.

- Increase per-notebook timeout from 300s to 600s (pure Python mode without Rust backend is significantly slower for Monte Carlo and optimization-heavy notebooks) - Exclude 10_trop.ipynb (LOOCV grid search exceeds 600s in pure Python mode) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The notebook was passing all periods (pre and post) as post_periods to MultiPeriodDiD, causing HonestDiD to fail with "No pre-period effects found" since the results had no pre-period classification. Fix: pass only actual post-treatment periods [5-9] to post_periods. MultiPeriodDiD automatically estimates pre-period coefficients for the event study, and HonestDiD can now correctly identify them. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

06_power_analysis.ipynb runs SyntheticDiD simulate_power which is a Monte Carlo simulation too slow for pure-Python CI without the Rust backend. Same category as the already-excluded TROP notebook. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

igerber and others added 2 commits March 21, 2026 11:32

igerber and others added 3 commits March 21, 2026 12:24

igerber merged commit 6454bd6 into main Mar 21, 2026
14 checks passed

igerber deleted the tech-debt branch March 21, 2026 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mypy errors, add notebook CI, clean up TODO#223

Fix mypy errors, add notebook CI, clean up TODO#223
igerber merged 6 commits intomainfrom
tech-debt

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Mar 21, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

igerber commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant