igerber
diff --git a/‎TODO.md‎
Lines changed: 26 additions & 23 deletions b/‎TODO.md‎
Lines changed: 26 additions & 23 deletions
diff --git a/‎diff_diff/estimators.py‎
Lines changed: 2 additions & 0 deletions b/‎diff_diff/estimators.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎diff_diff/imputation.py‎
Lines changed: 10 additions & 6 deletions b/‎diff_diff/imputation.py‎
Lines changed: 10 additions & 6 deletions
diff --git a/‎diff_diff/imputation_bootstrap.py‎
Lines changed: 39 additions & 70 deletions b/‎diff_diff/imputation_bootstrap.py‎
Lines changed: 39 additions & 70 deletions
@@ -12,8 +12,8 @@ Current limitations that may affect users:
 
 | Issue | Location | Priority | Notes |
 |-------|----------|----------|-------|
-| MultiPeriodDiD wild bootstrap not supported | `estimators.py:779-785` | Low | Edge case |
-| `predict()` raises NotImplementedError | `estimators.py:568-587` | Low | Rarely needed |
+| MultiPeriodDiD wild bootstrap not supported | `estimators.py:778-784` | Low | Edge case |
+| `predict()` raises NotImplementedError | `estimators.py:567-588` | Low | Rarely needed |
 
 ## Code Quality
 
@@ -23,14 +23,20 @@ Target: < 1000 lines per module for maintainability.
 
 | File | Lines | Action |
 |------|-------|--------|
-| `utils.py` | 1780 | Monitor -- legacy placebo function removed |
-| `visualization.py` | 1678 | Monitor -- growing but cohesive |
-| `linalg.py` | 1537 | Monitor -- unified backend, splitting would hurt cohesion |
+| `trop.py` | 2738 | Consider splitting — 2.7× target |
+| `utils.py` | 1838 | Monitor |
+| `staggered.py` | 1785 | Monitor |
+| `imputation.py` | 1756 | Monitor |
+| `visualization.py` | 1727 | Monitor — growing but cohesive |
+| `linalg.py` | 1727 | Monitor — unified backend, splitting would hurt cohesion |
+| `triple_diff.py` | 1581 | Monitor |
 | `honest_did.py` | 1511 | Acceptable |
+| `two_stage.py` | 1451 | Acceptable |
 | `power.py` | 1350 | Acceptable |
-| `triple_diff.py` | 1322 | Acceptable |
-| `sun_abraham.py` | 1227 | Acceptable |
-| `estimators.py` | 1161 | Acceptable |
+| `prep.py` | 1242 | Acceptable |
+| `sun_abraham.py` | 1162 | Acceptable |
+| `continuous_did.py` | 1155 | Acceptable |
+| `estimators.py` | 1147 | Acceptable |
 | `pretrends.py` | 1104 | Acceptable |
 
 ---
@@ -44,7 +50,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | Issue | Location | PR | Priority |
 |-------|----------|----|----------|
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails; fixing requires sparse least-squares alternatives) |
-| Bootstrap NaN-gating gap: manual SE/CI/p-value without non-finite filtering or SE<=0 guard | `imputation_bootstrap.py`, `two_stage_bootstrap.py` | #177 | Medium — migrate to `compute_effect_bootstrap_stats` from `bootstrap_utils.py` |
 | EfficientDiD: warn when cohort share is very small (< 2 units or < 1% of sample) — inverted in Omega*/EIF | `efficient_did_weights.py` | #192 | Low |
 | EfficientDiD: API docs / tutorial page for new public estimator | `docs/` | #192 | Medium |
 
@@ -62,7 +67,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | Tutorial notebooks not executed in CI | `docs/tutorials/*.ipynb` | #159 | Low |
 | R comparison tests spawn separate `Rscript` per test (slow CI) | `tests/test_methodology_twfe.py:294` | #139 | Low |
 | CS R helpers hard-code `xformla = ~ 1`; no covariate-adjusted R benchmark for IRLS path | `tests/test_methodology_callaway.py` | #202 | Low |
-| Context-dependent doc snippets pass via blanket NameError; no standalone validation | `tests/test_doc_snippets.py`, `docs/api/visualization.rst`, `docs/python_comparison.rst`, `docs/r_comparison.rst` | #206 | Low |
+| ~~Context-dependent doc snippets pass via blanket NameError~~ | `tests/test_doc_snippets.py` | #206 | ~~Low~~ — resolved: allow-list replaces blanket catch |
 | ~1,460 `duplicate object description` Sphinx warnings — each class attribute is documented in both module API pages and autosummary stubs; fix by adding `:no-index:` to one location or restructuring API docs to avoid overlap | `docs/api/*.rst`, `docs/api/_autosummary/` | — | Low |
 
 ---
@@ -82,22 +87,20 @@ Different estimators compute SEs differently. Consider unified interface.
 
 ### Type Annotations
 
-Pyright reports 282 type errors. Most are false positives from numpy/pandas type stubs.
+Mypy reports 9 errors (down from 81 before spring cleanup). All remaining are
+mixin `attr-defined` errors — methods accessed via `self` that live on the
+concrete class, not the mixin. Fixing these requires Protocol classes, which is
+low priority.
 
 | Category | Count | Notes |
 |----------|-------|-------|
-| reportArgumentType | 94 | numpy/pandas stub mismatches |
-| reportAttributeAccessIssue | 89 | Union types (results classes) |
-| reportReturnType | 21 | Return type mismatches |
-| reportOperatorIssue | 16 | Operators on incompatible types |
-| Others | 62 | Various minor issues |
-
-**Genuine issues to fix (low priority):**
-- [ ] Optional handling in `estimators.py:291,297,308` - None checks needed
-- [ ] Union type narrowing in `visualization.py:325-345` - results classes
-- [ ] numpy floating conversion in `diagnostics.py:669-673`
-
-**Note:** Most errors are false positives from imprecise type stubs. Mypy config in pyproject.toml already handles these via `disable_error_code`.
+| attr-defined (mixin methods) | 9 | Structural — requires Protocol refactor |
+
+**Resolved in spring cleanup:**
+- [x] `@overload` on `solve_ols` / `_solve_ols_numpy` — eliminated all unpacking mismatches
+- [x] `assert X is not None` guards — eliminated all Optional indexing errors
+- [x] Mixin scalar attribute stubs — eliminated 26 mixin attr-defined errors
+- [x] Matplotlib `tab10` lookup fix
 
 ## Deprecated Code
 
 
@@ -296,6 +296,7 @@ def fit(
         coefficients = reg.coefficients_
         residuals = reg.residuals_
         fitted = reg.fitted_values_
+        assert coefficients is not None
         att = coefficients[att_idx]
 
         # Get inference - either from bootstrap or analytical
@@ -1029,6 +1030,7 @@ def fit(  # type: ignore[override]
         post_effect_values = []
         post_effect_indices = []
 
+        assert vcov is not None
         for period in non_ref_periods:
             idx = interaction_indices[period]
             effect = coefficients[idx]
 
@@ -23,12 +23,13 @@
 from scipy.sparse.linalg import spsolve
 
 from diff_diff.imputation_bootstrap import ImputationDiDBootstrapMixin, _compute_target_weights
-from diff_diff.imputation_results import ImputationBootstrapResults, ImputationDiDResults  # noqa: F401 (re-export)
+from diff_diff.imputation_results import (  # noqa: F401 (re-export)
+    ImputationBootstrapResults,
+    ImputationDiDResults,
+)
 from diff_diff.linalg import solve_ols
 from diff_diff.utils import safe_inference
 
-
-
 # =============================================================================
 # Main Estimator
 # =============================================================================
@@ -417,9 +418,7 @@ def fit(
                 kept_cov_mask=kept_cov_mask,
             )
 
-        overall_t, overall_p, overall_ci = safe_inference(
-            overall_att, overall_se, alpha=self.alpha
-        )
+        overall_t, overall_p, overall_ci = safe_inference(overall_att, overall_se, alpha=self.alpha)
 
         # Event study and group aggregation
         event_study_effects = None
@@ -553,7 +552,9 @@ def fit(
                         and event_study_effects[h].get("n_obs", 1) > 0
                     ):
                         event_study_effects[h]["se"] = bootstrap_results.event_study_ses[h]
+                        assert bootstrap_results.event_study_cis is not None
                         event_study_effects[h]["conf_int"] = bootstrap_results.event_study_cis[h]
+                        assert bootstrap_results.event_study_p_values is not None
                         event_study_effects[h]["p_value"] = bootstrap_results.event_study_p_values[
                             h
                         ]
@@ -568,7 +569,9 @@ def fit(
                 for g in group_effects:
                     if g in bootstrap_results.group_ses:
                         group_effects[g]["se"] = bootstrap_results.group_ses[g]
+                        assert bootstrap_results.group_cis is not None
                         group_effects[g]["conf_int"] = bootstrap_results.group_cis[g]
+                        assert bootstrap_results.group_p_values is not None
                         group_effects[g]["p_value"] = bootstrap_results.group_p_values[g]
                         eff_val = group_effects[g]["effect"]
                         se_val = group_effects[g]["se"]
@@ -1614,6 +1617,7 @@ def _pretrend_test(self, n_leads: Optional[int] = None) -> Dict[str, Any]:
         )
         coefficients = result[0]
         vcov = result[2]
+        assert vcov is not None
 
         # Extract lead coefficients and their sub-VCV
         n_leads_actual = len(lead_cols)
 
@@ -6,13 +6,18 @@
 """
 
 import warnings
-from typing import Any, Dict, List, Optional, Tuple
+from typing import Any, Dict, List, Optional
 
 import numpy as np
 import pandas as pd
 
+from diff_diff.bootstrap_utils import (
+    compute_effect_bootstrap_stats as _compute_effect_bootstrap_stats,
+)
+from diff_diff.bootstrap_utils import (
+    generate_bootstrap_weights_batch as _generate_bootstrap_weights_batch,
+)
 from diff_diff.imputation_results import ImputationBootstrapResults
-from diff_diff.staggered_bootstrap import _generate_bootstrap_weights_batch
 
 __all__ = [
     "ImputationDiDBootstrapMixin",
@@ -55,46 +60,13 @@ def _compute_target_weights(
 class ImputationDiDBootstrapMixin:
     """Mixin providing bootstrap inference methods for ImputationDiD."""
 
-    def _compute_percentile_ci(
-        self,
-        boot_dist: np.ndarray,
-        alpha: float,
-    ) -> Tuple[float, float]:
-        """Compute percentile confidence interval from bootstrap distribution."""
-        lower = float(np.percentile(boot_dist, alpha / 2 * 100))
-        upper = float(np.percentile(boot_dist, (1 - alpha / 2) * 100))
-        return (lower, upper)
-
-    def _compute_bootstrap_pvalue(
-        self,
-        original_effect: float,
-        boot_dist: np.ndarray,
-        n_valid: Optional[int] = None,
-    ) -> float:
-        """
-        Compute two-sided bootstrap p-value.
-
-        Uses the percentile method: p-value is the proportion of bootstrap
-        estimates on the opposite side of zero from the original estimate,
-        doubled for two-sided test.
-
-        Parameters
-        ----------
-        original_effect : float
-            Original point estimate.
-        boot_dist : np.ndarray
-            Bootstrap distribution of the effect.
-        n_valid : int, optional
-            Number of valid bootstrap samples. If None, uses self.n_bootstrap.
-        """
-        if original_effect >= 0:
-            p_one_sided = float(np.mean(boot_dist <= 0))
-        else:
-            p_one_sided = float(np.mean(boot_dist >= 0))
-        p_value = min(2 * p_one_sided, 1.0)
-        n_for_floor = n_valid if n_valid is not None else self.n_bootstrap
-        p_value = max(p_value, 1 / (n_for_floor + 1))
-        return p_value
+    # Type hints for attributes accessed from the main class
+    n_bootstrap: int
+    bootstrap_weights: str
+    alpha: float
+    seed: Optional[int]
+    anticipation: int
+    horizon_max: Optional[int]
 
     def _precompute_bootstrap_psi(
         self,
@@ -266,16 +238,11 @@ def _run_bootstrap(
         # We do the same here so percentile CIs and empirical p-values work correctly.
         boot_overall_shifted = boot_overall + original_att
 
-        overall_se = float(np.std(boot_overall, ddof=1))
-        overall_ci = (
-            self._compute_percentile_ci(boot_overall_shifted, self.alpha)
-            if overall_se > 0
-            else (np.nan, np.nan)
-        )
-        overall_p = (
-            self._compute_bootstrap_pvalue(original_att, boot_overall_shifted)
-            if overall_se > 0
-            else np.nan
+        overall_se, overall_ci, overall_p = _compute_effect_bootstrap_stats(
+            original_att,
+            boot_overall_shifted,
+            alpha=self.alpha,
+            context="ImputationDiD overall ATT",
         )
 
         event_study_ses = None
@@ -286,16 +253,17 @@ def _run_bootstrap(
             event_study_cis = {}
             event_study_p_values = {}
             for h in boot_event_study:
-                se_h = float(np.std(boot_event_study[h], ddof=1))
-                event_study_ses[h] = se_h
                 orig_eff = original_event_study[h]["effect"]
-                if se_h > 0 and np.isfinite(orig_eff):
-                    shifted_h = boot_event_study[h] + orig_eff
-                    event_study_p_values[h] = self._compute_bootstrap_pvalue(orig_eff, shifted_h)
-                    event_study_cis[h] = self._compute_percentile_ci(shifted_h, self.alpha)
-                else:
-                    event_study_p_values[h] = np.nan
-                    event_study_cis[h] = (np.nan, np.nan)
+                shifted_h = boot_event_study[h] + orig_eff
+                se_h, ci_h, p_h = _compute_effect_bootstrap_stats(
+                    orig_eff,
+                    shifted_h,
+                    alpha=self.alpha,
+                    context=f"ImputationDiD event study (h={h})",
+                )
+                event_study_ses[h] = se_h
+                event_study_cis[h] = ci_h
+                event_study_p_values[h] = p_h
 
         group_ses = None
         group_cis = None
@@ -305,16 +273,17 @@ def _run_bootstrap(
             group_cis = {}
             group_p_values = {}
             for g in boot_group:
-                se_g = float(np.std(boot_group[g], ddof=1))
-                group_ses[g] = se_g
                 orig_eff = original_group[g]["effect"]
-                if se_g > 0 and np.isfinite(orig_eff):
-                    shifted_g = boot_group[g] + orig_eff
-                    group_p_values[g] = self._compute_bootstrap_pvalue(orig_eff, shifted_g)
-                    group_cis[g] = self._compute_percentile_ci(shifted_g, self.alpha)
-                else:
-                    group_p_values[g] = np.nan
-                    group_cis[g] = (np.nan, np.nan)
+                shifted_g = boot_group[g] + orig_eff
+                se_g, ci_g, p_g = _compute_effect_bootstrap_stats(
+                    orig_eff,
+                    shifted_g,
+                    alpha=self.alpha,
+                    context=f"ImputationDiD group effect (g={g})",
+                )
+                group_ses[g] = se_g
+                group_cis[g] = ci_g
+                group_p_values[g] = p_g
 
         return ImputationBootstrapResults(
             n_bootstrap=self.n_bootstrap,