Address tech debt from code reviews (PRs #115-#159) by igerber · Pull Request #165 · igerber/diff-diff

igerber · 2026-02-17T17:13:50Z

Summary

Add bootstrap_weights parameter to TwoStageDiD and ImputationDiD (rademacher/mammen/webb, matching CallawaySantAnna API)
Unify TwoStageDiD GMM score computation via _compute_gmm_scores() static method with consistent NaN/overflow handling
Extract _compute_target_weights() helper for ImputationDiD, eliminating aggregation/bootstrap weight duplication
Optimize TwoStageDiD cluster score loop: single .toarray() replaces per-column .getcol(j).toarray()
Add TROP n_bootstrap >= 2 validation (ValueError)
Remove SunAbraham deprecated min_pre_periods/min_post_periods params
Remove legacy compute_placebo_effects from utils.py
Add ImputationDiD bootstrap + covariate test
Update TODO.md marking completed items

Methodology references (required if estimator / math changes)

Method name(s): TwoStageDiD (Gardner 2022), ImputationDiD (Borusyak et al. 2024), TROP (Athey et al. 2025)
Paper / source link(s): N/A — no methodology changes, only API consistency, deduplication, and cleanup
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: tests/test_two_stage.py, tests/test_imputation.py, tests/test_trop.py, tests/test_sun_abraham.py, tests/test_utils.py
Full suite: 1494 passed, 32 skipped, 0 failures
Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

- Add bootstrap_weights parameter to TwoStageDiD and ImputationDiD (rademacher/mammen/webb, matching CallawaySantAnna API) - Unify TwoStageDiD GMM score computation via _compute_gmm_scores() static method with consistent NaN/overflow handling - Extract _compute_target_weights() helper for ImputationDiD weight construction, eliminating aggregation/bootstrap duplication - Optimize TwoStageDiD cluster score loop: single .toarray() call replaces per-column .getcol(j).toarray() - Add TROP n_bootstrap >= 2 validation (ValueError) - Remove SunAbraham deprecated min_pre_periods/min_post_periods params - Remove legacy compute_placebo_effects from utils.py - Add ImputationDiD bootstrap + covariate test - Update TODO.md marking completed items Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-17T17:19:30Z

Overall assessment: ⚠️ Needs changes

Executive Summary

P1: Bootstrap weight selection was added for ImputationDiD/TwoStageDiD, but the Methodology Registry still documents Rademacher-only; inference docs are now inconsistent.
P1: SunAbraham removed deprecated min_pre_periods/min_post_periods, conflicting with the Registry’s “accepted but ignored” guidance; this is an undocumented API change.
P2: TwoStageDiD cluster-score aggregation now densifies the full sparse matrix, increasing peak memory risk on large panels.
AGENTS.md not found in the repo; proceeded without additional agent instructions.

Methodology

Severity: P1. Impact: Registry says ImputationDiD and TwoStageDiD bootstrap uses Rademacher weights, but code now allows mammen/webb, altering the inference distribution without documentation. Fix: update docs/methodology/REGISTRY.md to list supported weights and rationale (or restrict to Rademacher in code). docs/methodology/REGISTRY.md:L548-L613, diff_diff/imputation.py:L63-L157, diff_diff/two_stage.py:L67-L151, diff_diff/imputation_bootstrap.py:L214-L314, diff_diff/two_stage_bootstrap.py:L198-L383.
Severity: P1. Impact: SunAbraham registry still promises deprecated min_pre_periods/min_post_periods are accepted with FutureWarning, but parameters are now removed, so callers will get TypeError instead. Fix: either restore the deprecated parameters with warnings or update the registry (and user-facing docs/changelog) to mark removal. docs/methodology/REGISTRY.md:L430-L431, diff_diff/sun_abraham.py:L428-L455.

Code Quality
No issues found.

Performance

Severity: P2. Impact: New weighted_X10.toarray() densifies the full sparse matrix during cluster aggregation, increasing peak memory and risking OOM on large panels compared to the prior per-column approach. Fix: add a size-based guard or chunked aggregation to keep peak memory bounded. diff_diff/two_stage.py:L1225-L1231, diff_diff/two_stage_bootstrap.py:L109-L114.

Maintainability
No issues found.

Tech Debt
No issues found.

Security
No issues found.

Documentation/Tests

Severity: P2. Impact: Bootstrap result docstrings still claim “rademacher only,” which is now incorrect and may mislead users. Fix: update docstrings to list supported weights or reference the estimator’s bootstrap_weights parameter. diff_diff/imputation_results.py:L33-L37, diff_diff/two_stage_results.py:L34-L38.
Severity: P3. Impact: TROP now raises on n_bootstrap < 2, but the parameter docs don’t mention the minimum; this is a user-visible change. Fix: update the parameter docs (and consider a brief changelog note if this is a breaking API change). diff_diff/trop.py:L95-L167.

Address 5 AI review items: update Methodology Registry for bootstrap_weights and SunAbraham param removal, fix bootstrap result docstrings, add memory guidance for dense .toarray(), and document n_bootstrap minimum. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-17T17:56:13Z

/ai-review

github-actions · 2026-02-17T18:01:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e97ca9d8902b0e790bb8fbb86a38667ba687e8c2

Overall assessment: ⚠️ Needs changes

Executive summary:

bootstrap_weights added for ImputationDiD/TwoStageDiD and shared weight helper align with the Methodology Registry; no estimator mismatch found.
TROP now enforces n_bootstrap >= 2, which prevents undefined SEs and is consistent with always-bootstrap variance.
Potential memory regression: TwoStageDiD GMM score aggregation now densifies the full sparse FE matrix, which can OOM on large panels.
README still documents removed SunAbraham.fit() params (min_pre_periods, min_post_periods), which now raise TypeError.

Methodology
Affected methods: ImputationDiD, TwoStageDiD, SunAbraham, TROP. Cross-checked against docs/methodology/REGISTRY.md; no mismatches with source material found.
No findings.

Code Quality
No findings.

Performance

P1 — Impact: weighted_X10.toarray() materializes the full (n × p) FE matrix and can blow memory for large panels; the previous per-column path avoided the dense allocation. Fix: keep sparse aggregation (e.g., accumulate from weighted_X10.tocoo() with np.add.at on (cluster_idx[row], col)), or add a size guard to fall back to per-column getcol() when n * p or nnz exceeds a threshold. diff_diff/two_stage.py:L1218-L1233, diff_diff/two_stage_bootstrap.py:L104-L115.

Maintainability
No findings.

Tech Debt
No findings.

Security
No findings.

Documentation/Tests

P2 — Impact: README still advertises min_pre_periods/min_post_periods for SunAbraham.fit(), but those params were removed and now raise TypeError; users following the docs will hit runtime errors. Fix: remove those rows or explicitly note the removal and error behavior. README.md:L2057-L2068.
P3 — Impact: There is no test asserting the new TypeError behavior for removed SunAbraham params; regressions could reintroduce silent ignoring. Fix: add a test in tests/test_sun_abraham.py asserting fit(..., min_pre_periods=2) and fit(..., min_post_periods=2) raise TypeError.

…round 2 - Add _SPARSE_DENSE_THRESHOLD in two_stage.py with per-column .tocsc() fallback for large FE matrices; apply same pattern in bootstrap module - Remove min_pre_periods/min_post_periods from README SunAbraham table - Add test_removed_params_raise_typeerror for SunAbraham - Add test_sparse_fallback_path for TwoStageDiD dense/sparse equivalence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-17T18:17:11Z

/ai-review

github-actions · 2026-02-17T18:21:41Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b2e40d82db0c44b180cc13b9c08738741effc30

Overall assessment: ⚠️ Needs changes

Executive summary

Methodology registry updates for ImputationDiD/TwoStageDiD/SunAbraham align with the code changes; no estimator-math mismatches found.
New cross-module import for _SPARSE_DENSE_THRESHOLD introduces a circular-import risk if diff_diff.two_stage_bootstrap is imported directly.
README does not document the new bootstrap_weights parameter or TROP’s n_bootstrap >= 2 constraint.

Methodology
No issues found.

Code Quality
No issues found.

Performance
No issues found.

Maintainability

P2 — Impact: Direct import diff_diff.two_stage_bootstrap can now trigger a circular import (two_stage_bootstrap -> two_stage -> two_stage_bootstrap) and raise ImportError before TwoStageDiDBootstrapMixin is defined. Fix: move _SPARSE_DENSE_THRESHOLD into a small shared module (e.g., diff_diff/two_stage_shared.py) or duplicate the constant in diff_diff/two_stage_bootstrap.py to remove the cross-import. (diff_diff/two_stage_bootstrap.py:L16-L19, diff_diff/two_stage.py:L32-L39)

Tech Debt
No issues found.

Security
No issues found.

Documentation/Tests

P3 — Impact: README constructor examples for ImputationDiD and TwoStageDiD omit the new bootstrap_weights parameter, so users won’t discover the available weight types. Fix: add bootstrap_weights='rademacher' (with a brief comment) to both constructor blocks and/or update the parameter tables. (README.md:L2101-L2110, README.md:L2155-L2163)
P3 — Impact: README TROP signature doesn’t note the new n_bootstrap >= 2 requirement; users may pass 0/1 and hit a ValueError without guidance. Fix: update the inline comment or add a short note in the TROP section. (README.md:L1979-L1987)

Duplicate _SPARSE_DENSE_THRESHOLD in two_stage_bootstrap.py to eliminate cross-import from two_stage.py. Add bootstrap_weights param to README ImputationDiD/TwoStageDiD constructors and n_bootstrap minimum note to TROP. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-17T18:31:57Z

/ai-review

github-actions · 2026-02-17T18:38:26Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 98029d1587c622ce744adfa578a66459ebaf79dd

Overall assessment: ✅ Looks good

Executive summary

Methods affected: TwoStageDiD, ImputationDiD (bootstrap weight types + GMM score unification), TROP (bootstrap minimum), SunAbraham (param removal); registry updates align and no methodology mismatches found.
No NaN/Inf inference anti-patterns introduced in modified paths.
Minor maintainability and test/doc coverage gaps noted below.

Methodology
No findings.

Code Quality
No findings.

Performance
No findings.

Maintainability

P3 | Impact: _SPARSE_DENSE_THRESHOLD is duplicated across analytic and bootstrap paths, creating drift risk if only one is updated. | Fix: centralize the constant in a shared module and import it in both files. diff_diff/two_stage.py:L32-L35, diff_diff/two_stage_bootstrap.py:L18-L21

Tech Debt
No findings.

Security
No findings.

Documentation/Tests

P3 | Impact: bootstrap_weights='webb' is untested for ImputationDiD/TwoStageDiD, so regressions in that weight path could slip through. | Fix: add a minimal bootstrap test mirroring the mammen tests for webb. tests/test_imputation.py:L1092-L1105, tests/test_two_stage.py:L1063-L1075
P3 | Impact: TROP now enforces n_bootstrap >= 2, but the Methodology Registry doesn’t mention this constraint, risking doc drift. | Fix: add a brief note in the TROP section. diff_diff/trop.py:L134-L163, docs/methodology/REGISTRY.md:L825-L900

… for PR #165 round 4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-17T19:45:15Z

/ai-review

github-actions · 2026-02-17T19:51:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d7ec6948034fed0d69712cfdc876f79acd7bdf69

Overall assessment: ⚠️ Needs changes

Executive summary

Methods affected: TwoStageDiD, ImputationDiD, TROP, SunAbraham; registry updates appear consistent with code changes (no methodology mismatches found).
P1: New bootstrap_weights parameter is not tested across event‑study/group bootstrap paths.
P3: Imputation bootstrap docstrings still describe Rademacher‑only weights.
P3: _SPARSE_DENSE_THRESHOLD duplicated across analytic and bootstrap modules (drift risk).

Methodology

No findings. Cross‑checked updates for ImputationDiD/TwoStageDiD bootstrap weights and TROP bootstrap minimum against the registry; code behavior matches documented changes.

Code Quality

No findings.

Performance

No findings.

Maintainability

P3 – Impact: _SPARSE_DENSE_THRESHOLD is duplicated, which can drift between analytic and bootstrap paths and subtly change behavior. Fix: move to a shared constant (e.g., diff_diff/constants.py) and import in both modules. Evidence: diff_diff/two_stage.py:L32-L35, diff_diff/two_stage_bootstrap.py:L18-L20.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P1 – Impact: bootstrap_weights is only tested for overall ATT; event‑study/group bootstrap paths with non‑default weights are untested, so regressions there could slip through. Fix: add tests that set bootstrap_weights with aggregate="event_study" (and group aggregation for Imputation) and assert finite event_study_ses/group_ses plus correct weight_type. Evidence: tests/test_imputation.py:L1092-L1120, tests/test_two_stage.py:L1063-L1093.
P3 – Impact: In‑code docstrings still state Rademacher‑only weights, which is now inaccurate and misleading. Fix: update wording to “multiplier weights (rademacher/mammen/webb; configurable via bootstrap_weights)”. Evidence: diff_diff/imputation_bootstrap.py:L122-L128, diff_diff/imputation_bootstrap.py:L218-L223.

…update docstrings for PR #165 round 5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-17T20:25:25Z

/ai-review

github-actions · 2026-02-17T20:31:56Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 969ae827fc52de0b570b65e2abe8352641e01a00

Overall assessment: ✅ Looks good

Executive Summary

Methods affected are ImputationDiD, TwoStageDiD, TROP, and SunAbraham; registry updates align with the code changes for bootstrap weighting and bootstrap minimums.
Bootstrap weight type is now parameterized and wired through bootstrap generation/results for ImputationDiD and TwoStageDiD, with coverage for overall/event‑study/group paths in tests.
TwoStageDiD’s GMM score handling is unified across analytic/bootstrap paths, and the new dense/sparse threshold has a parity test.

Methodology
No findings. Methods affected: ImputationDiD, TwoStageDiD, TROP, SunAbraham. Cross‑check against the Methodology Registry and in‑code docstrings shows consistency with the changes (bootstrap weight configurability and TROP’s n_bootstrap >= 2 guard). Relevant references: docs/methodology/REGISTRY.md:L543-L610, docs/methodology/REGISTRY.md:L939-L944, diff_diff/imputation.py:L124-L170, diff_diff/imputation_bootstrap.py:L214-L244, diff_diff/two_stage.py:L123-L170, diff_diff/two_stage_bootstrap.py:L199-L214, diff_diff/trop.py:L154-L166, diff_diff/sun_abraham.py:L431-L466.

Code Quality
No findings.

Performance
No findings.

Maintainability

P3 — _SPARSE_DENSE_THRESHOLD is duplicated in both analytic and bootstrap modules.
Impact: Risk of behavioral drift between analytic and bootstrap paths if one constant changes and the other isn’t updated.
Concrete fix: Define the threshold in one shared location and import it in both diff_diff/two_stage.py and diff_diff/two_stage_bootstrap.py (diff_diff/two_stage.py:L32-L35, diff_diff/two_stage_bootstrap.py:L18-L21).

Tech Debt
No findings.

Security
No findings.

Documentation/Tests
No findings.

Add webb bootstrap weight tests, document TROP n_bootstrap constraint…

d7ec694

… for PR #165 round 4 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add bootstrap_weights tests for event-study/group aggregation paths, …

969ae82

…update docstrings for PR #165 round 5 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber merged commit b1e0237 into main Feb 17, 2026
10 checks passed

igerber deleted the tech-debt-paydown branch February 17, 2026 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Address tech debt from code reviews (PRs #115-#159)#165

Address tech debt from code reviews (PRs #115-#159)#165
igerber merged 6 commits intomainfrom
tech-debt-paydown

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Feb 17, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

igerber commented Feb 17, 2026

Uh oh!

github-actions bot commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant