Skip to content

Address tech debt from code reviews (PRs #115-#159)#165

Merged
igerber merged 6 commits intomainfrom
tech-debt-paydown
Feb 17, 2026
Merged

Address tech debt from code reviews (PRs #115-#159)#165
igerber merged 6 commits intomainfrom
tech-debt-paydown

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Feb 17, 2026

Summary

  • Add bootstrap_weights parameter to TwoStageDiD and ImputationDiD (rademacher/mammen/webb, matching CallawaySantAnna API)
  • Unify TwoStageDiD GMM score computation via _compute_gmm_scores() static method with consistent NaN/overflow handling
  • Extract _compute_target_weights() helper for ImputationDiD, eliminating aggregation/bootstrap weight duplication
  • Optimize TwoStageDiD cluster score loop: single .toarray() replaces per-column .getcol(j).toarray()
  • Add TROP n_bootstrap >= 2 validation (ValueError)
  • Remove SunAbraham deprecated min_pre_periods/min_post_periods params
  • Remove legacy compute_placebo_effects from utils.py
  • Add ImputationDiD bootstrap + covariate test
  • Update TODO.md marking completed items

Methodology references (required if estimator / math changes)

  • Method name(s): TwoStageDiD (Gardner 2022), ImputationDiD (Borusyak et al. 2024), TROP (Athey et al. 2025)
  • Paper / source link(s): N/A — no methodology changes, only API consistency, deduplication, and cleanup
  • Any intentional deviations from the source (and why): None

Validation

  • Tests added/updated: tests/test_two_stage.py, tests/test_imputation.py, tests/test_trop.py, tests/test_sun_abraham.py, tests/test_utils.py
  • Full suite: 1494 passed, 32 skipped, 0 failures
  • Backtest / simulation / notebook evidence (if applicable): N/A

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

- Add bootstrap_weights parameter to TwoStageDiD and ImputationDiD
  (rademacher/mammen/webb, matching CallawaySantAnna API)
- Unify TwoStageDiD GMM score computation via _compute_gmm_scores()
  static method with consistent NaN/overflow handling
- Extract _compute_target_weights() helper for ImputationDiD weight
  construction, eliminating aggregation/bootstrap duplication
- Optimize TwoStageDiD cluster score loop: single .toarray() call
  replaces per-column .getcol(j).toarray()
- Add TROP n_bootstrap >= 2 validation (ValueError)
- Remove SunAbraham deprecated min_pre_periods/min_post_periods params
- Remove legacy compute_placebo_effects from utils.py
- Add ImputationDiD bootstrap + covariate test
- Update TODO.md marking completed items

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall assessment: ⚠️ Needs changes

Executive Summary

  • P1: Bootstrap weight selection was added for ImputationDiD/TwoStageDiD, but the Methodology Registry still documents Rademacher-only; inference docs are now inconsistent.
  • P1: SunAbraham removed deprecated min_pre_periods/min_post_periods, conflicting with the Registry’s “accepted but ignored” guidance; this is an undocumented API change.
  • P2: TwoStageDiD cluster-score aggregation now densifies the full sparse matrix, increasing peak memory risk on large panels.
  • AGENTS.md not found in the repo; proceeded without additional agent instructions.

Methodology

  • Severity: P1. Impact: Registry says ImputationDiD and TwoStageDiD bootstrap uses Rademacher weights, but code now allows mammen/webb, altering the inference distribution without documentation. Fix: update docs/methodology/REGISTRY.md to list supported weights and rationale (or restrict to Rademacher in code). docs/methodology/REGISTRY.md:L548-L613, diff_diff/imputation.py:L63-L157, diff_diff/two_stage.py:L67-L151, diff_diff/imputation_bootstrap.py:L214-L314, diff_diff/two_stage_bootstrap.py:L198-L383.
  • Severity: P1. Impact: SunAbraham registry still promises deprecated min_pre_periods/min_post_periods are accepted with FutureWarning, but parameters are now removed, so callers will get TypeError instead. Fix: either restore the deprecated parameters with warnings or update the registry (and user-facing docs/changelog) to mark removal. docs/methodology/REGISTRY.md:L430-L431, diff_diff/sun_abraham.py:L428-L455.

Code Quality
No issues found.

Performance

  • Severity: P2. Impact: New weighted_X10.toarray() densifies the full sparse matrix during cluster aggregation, increasing peak memory and risking OOM on large panels compared to the prior per-column approach. Fix: add a size-based guard or chunked aggregation to keep peak memory bounded. diff_diff/two_stage.py:L1225-L1231, diff_diff/two_stage_bootstrap.py:L109-L114.

Maintainability
No issues found.

Tech Debt
No issues found.

Security
No issues found.

Documentation/Tests

  • Severity: P2. Impact: Bootstrap result docstrings still claim “rademacher only,” which is now incorrect and may mislead users. Fix: update docstrings to list supported weights or reference the estimator’s bootstrap_weights parameter. diff_diff/imputation_results.py:L33-L37, diff_diff/two_stage_results.py:L34-L38.
  • Severity: P3. Impact: TROP now raises on n_bootstrap < 2, but the parameter docs don’t mention the minimum; this is a user-visible change. Fix: update the parameter docs (and consider a brief changelog note if this is a breaking API change). diff_diff/trop.py:L95-L167.

Address 5 AI review items: update Methodology Registry for bootstrap_weights
and SunAbraham param removal, fix bootstrap result docstrings, add memory
guidance for dense .toarray(), and document n_bootstrap minimum.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 17, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e97ca9d8902b0e790bb8fbb86a38667ba687e8c2


Overall assessment: ⚠️ Needs changes

Executive summary:

  • bootstrap_weights added for ImputationDiD/TwoStageDiD and shared weight helper align with the Methodology Registry; no estimator mismatch found.
  • TROP now enforces n_bootstrap >= 2, which prevents undefined SEs and is consistent with always-bootstrap variance.
  • Potential memory regression: TwoStageDiD GMM score aggregation now densifies the full sparse FE matrix, which can OOM on large panels.
  • README still documents removed SunAbraham.fit() params (min_pre_periods, min_post_periods), which now raise TypeError.

Methodology
Affected methods: ImputationDiD, TwoStageDiD, SunAbraham, TROP. Cross-checked against docs/methodology/REGISTRY.md; no mismatches with source material found.
No findings.

Code Quality
No findings.

Performance

  • P1 — Impact: weighted_X10.toarray() materializes the full (n × p) FE matrix and can blow memory for large panels; the previous per-column path avoided the dense allocation. Fix: keep sparse aggregation (e.g., accumulate from weighted_X10.tocoo() with np.add.at on (cluster_idx[row], col)), or add a size guard to fall back to per-column getcol() when n * p or nnz exceeds a threshold. diff_diff/two_stage.py:L1218-L1233, diff_diff/two_stage_bootstrap.py:L104-L115.

Maintainability
No findings.

Tech Debt
No findings.

Security
No findings.

Documentation/Tests

  • P2 — Impact: README still advertises min_pre_periods/min_post_periods for SunAbraham.fit(), but those params were removed and now raise TypeError; users following the docs will hit runtime errors. Fix: remove those rows or explicitly note the removal and error behavior. README.md:L2057-L2068.
  • P3 — Impact: There is no test asserting the new TypeError behavior for removed SunAbraham params; regressions could reintroduce silent ignoring. Fix: add a test in tests/test_sun_abraham.py asserting fit(..., min_pre_periods=2) and fit(..., min_post_periods=2) raise TypeError.

…round 2

- Add _SPARSE_DENSE_THRESHOLD in two_stage.py with per-column .tocsc()
  fallback for large FE matrices; apply same pattern in bootstrap module
- Remove min_pre_periods/min_post_periods from README SunAbraham table
- Add test_removed_params_raise_typeerror for SunAbraham
- Add test_sparse_fallback_path for TwoStageDiD dense/sparse equivalence

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 17, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b2e40d82db0c44b180cc13b9c08738741effc30


Overall assessment: ⚠️ Needs changes

Executive summary

  • Methodology registry updates for ImputationDiD/TwoStageDiD/SunAbraham align with the code changes; no estimator-math mismatches found.
  • New cross-module import for _SPARSE_DENSE_THRESHOLD introduces a circular-import risk if diff_diff.two_stage_bootstrap is imported directly.
  • README does not document the new bootstrap_weights parameter or TROP’s n_bootstrap >= 2 constraint.

Methodology
No issues found.

Code Quality
No issues found.

Performance
No issues found.

Maintainability

  • P2 — Impact: Direct import diff_diff.two_stage_bootstrap can now trigger a circular import (two_stage_bootstrap -> two_stage -> two_stage_bootstrap) and raise ImportError before TwoStageDiDBootstrapMixin is defined. Fix: move _SPARSE_DENSE_THRESHOLD into a small shared module (e.g., diff_diff/two_stage_shared.py) or duplicate the constant in diff_diff/two_stage_bootstrap.py to remove the cross-import. (diff_diff/two_stage_bootstrap.py:L16-L19, diff_diff/two_stage.py:L32-L39)

Tech Debt
No issues found.

Security
No issues found.

Documentation/Tests

  • P3 — Impact: README constructor examples for ImputationDiD and TwoStageDiD omit the new bootstrap_weights parameter, so users won’t discover the available weight types. Fix: add bootstrap_weights='rademacher' (with a brief comment) to both constructor blocks and/or update the parameter tables. (README.md:L2101-L2110, README.md:L2155-L2163)
  • P3 — Impact: README TROP signature doesn’t note the new n_bootstrap >= 2 requirement; users may pass 0/1 and hit a ValueError without guidance. Fix: update the inline comment or add a short note in the TROP section. (README.md:L1979-L1987)

Duplicate _SPARSE_DENSE_THRESHOLD in two_stage_bootstrap.py to eliminate
cross-import from two_stage.py. Add bootstrap_weights param to README
ImputationDiD/TwoStageDiD constructors and n_bootstrap minimum note to TROP.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 17, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 98029d1587c622ce744adfa578a66459ebaf79dd


Overall assessment: ✅ Looks good

Executive summary

  • Methods affected: TwoStageDiD, ImputationDiD (bootstrap weight types + GMM score unification), TROP (bootstrap minimum), SunAbraham (param removal); registry updates align and no methodology mismatches found.
  • No NaN/Inf inference anti-patterns introduced in modified paths.
  • Minor maintainability and test/doc coverage gaps noted below.

Methodology
No findings.

Code Quality
No findings.

Performance
No findings.

Maintainability

  • P3 | Impact: _SPARSE_DENSE_THRESHOLD is duplicated across analytic and bootstrap paths, creating drift risk if only one is updated. | Fix: centralize the constant in a shared module and import it in both files. diff_diff/two_stage.py:L32-L35, diff_diff/two_stage_bootstrap.py:L18-L21

Tech Debt
No findings.

Security
No findings.

Documentation/Tests

  • P3 | Impact: bootstrap_weights='webb' is untested for ImputationDiD/TwoStageDiD, so regressions in that weight path could slip through. | Fix: add a minimal bootstrap test mirroring the mammen tests for webb. tests/test_imputation.py:L1092-L1105, tests/test_two_stage.py:L1063-L1075
  • P3 | Impact: TROP now enforces n_bootstrap >= 2, but the Methodology Registry doesn’t mention this constraint, risking doc drift. | Fix: add a brief note in the TROP section. diff_diff/trop.py:L134-L163, docs/methodology/REGISTRY.md:L825-L900

… for PR #165 round 4

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 17, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d7ec6948034fed0d69712cfdc876f79acd7bdf69


Overall assessment: ⚠️ Needs changes

Executive summary

  • Methods affected: TwoStageDiD, ImputationDiD, TROP, SunAbraham; registry updates appear consistent with code changes (no methodology mismatches found).
  • P1: New bootstrap_weights parameter is not tested across event‑study/group bootstrap paths.
  • P3: Imputation bootstrap docstrings still describe Rademacher‑only weights.
  • P3: _SPARSE_DENSE_THRESHOLD duplicated across analytic and bootstrap modules (drift risk).

Methodology

  • No findings. Cross‑checked updates for ImputationDiD/TwoStageDiD bootstrap weights and TROP bootstrap minimum against the registry; code behavior matches documented changes.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • P3 – Impact: _SPARSE_DENSE_THRESHOLD is duplicated, which can drift between analytic and bootstrap paths and subtly change behavior. Fix: move to a shared constant (e.g., diff_diff/constants.py) and import in both modules. Evidence: diff_diff/two_stage.py:L32-L35, diff_diff/two_stage_bootstrap.py:L18-L20.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P1 – Impact: bootstrap_weights is only tested for overall ATT; event‑study/group bootstrap paths with non‑default weights are untested, so regressions there could slip through. Fix: add tests that set bootstrap_weights with aggregate="event_study" (and group aggregation for Imputation) and assert finite event_study_ses/group_ses plus correct weight_type. Evidence: tests/test_imputation.py:L1092-L1120, tests/test_two_stage.py:L1063-L1093.
  • P3 – Impact: In‑code docstrings still state Rademacher‑only weights, which is now inaccurate and misleading. Fix: update wording to “multiplier weights (rademacher/mammen/webb; configurable via bootstrap_weights)”. Evidence: diff_diff/imputation_bootstrap.py:L122-L128, diff_diff/imputation_bootstrap.py:L218-L223.

…update docstrings for PR #165 round 5

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Feb 17, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 969ae827fc52de0b570b65e2abe8352641e01a00


Overall assessment: ✅ Looks good

Executive Summary

  • Methods affected are ImputationDiD, TwoStageDiD, TROP, and SunAbraham; registry updates align with the code changes for bootstrap weighting and bootstrap minimums.
  • Bootstrap weight type is now parameterized and wired through bootstrap generation/results for ImputationDiD and TwoStageDiD, with coverage for overall/event‑study/group paths in tests.
  • TwoStageDiD’s GMM score handling is unified across analytic/bootstrap paths, and the new dense/sparse threshold has a parity test.

Methodology
No findings. Methods affected: ImputationDiD, TwoStageDiD, TROP, SunAbraham. Cross‑check against the Methodology Registry and in‑code docstrings shows consistency with the changes (bootstrap weight configurability and TROP’s n_bootstrap >= 2 guard). Relevant references: docs/methodology/REGISTRY.md:L543-L610, docs/methodology/REGISTRY.md:L939-L944, diff_diff/imputation.py:L124-L170, diff_diff/imputation_bootstrap.py:L214-L244, diff_diff/two_stage.py:L123-L170, diff_diff/two_stage_bootstrap.py:L199-L214, diff_diff/trop.py:L154-L166, diff_diff/sun_abraham.py:L431-L466.

Code Quality
No findings.

Performance
No findings.

Maintainability

  • P3 — _SPARSE_DENSE_THRESHOLD is duplicated in both analytic and bootstrap modules.
    Impact: Risk of behavioral drift between analytic and bootstrap paths if one constant changes and the other isn’t updated.
    Concrete fix: Define the threshold in one shared location and import it in both diff_diff/two_stage.py and diff_diff/two_stage_bootstrap.py (diff_diff/two_stage.py:L32-L35, diff_diff/two_stage_bootstrap.py:L18-L21).

Tech Debt
No findings.

Security
No findings.

Documentation/Tests
No findings.

@igerber igerber merged commit b1e0237 into main Feb 17, 2026
10 checks passed
@igerber igerber deleted the tech-debt-paydown branch February 17, 2026 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant