Sun-Abraham methodology review: fix 5 issues, add R benchmarks#153
Sun-Abraham methodology review: fix 5 issues, add R benchmarks#153
Conversation
Complete methodology review of the Sun-Abraham interaction-weighted estimator against R's fixest::sunab(). ATT matches within machine precision, SE within 0.3%. Fixes: - Add df_adjustment for absorbed unit/time fixed effects in saturated regression - Return (NaN, NaN) instead of (0, 0) when no post-treatment effects exist - Add FutureWarning for unimplemented min_pre/post_periods parameters - Remove hardcoded [-20, 20] event-time truncation to match R behavior - Add UserWarning for variance fallback path using simplified variance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Overall assessment: Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…rtion - Fix df_adjustment from n_units + n_times - 2 to n_units + n_times - 1 (SunAbraham has no intercept column unlike TWFE) - Rewrite test to monkeypatch LinearRegression.fit and assert exact df values - Update METHODOLOGY_REVIEW.md with corrected formula and rationale Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…tring
- Fix IW weights to use n_{g,e} (per-event-time observation counts) instead
of n_g (total cohort sizes), matching REGISTRY formula for unbalanced panels
- Add unbalanced panel test verifying n_{g,e} != n_g distinction
- Update existing IW weights test to verify against event-time counts
- Mark min_pre_periods/min_post_periods as deprecated in fit() docstring
- Update METHODOLOGY_REVIEW.md verified-component line and corrections
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
first_treat=np.inf (documented as valid for never-treated units) passed all `> 0` checks in sun_abraham.py and staggered.py, causing np.inf to be treated as a cohort with -inf relative times. Fixed by normalizing np.inf to 0 immediately after computing _never_treated in both estimators. Added regression tests verifying equivalence between encodings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary
Methodology
Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
|
…_groups The np.inf → 0 normalization was placed after treatment_groups was computed, so np.inf passed the `g > 0` filter and leaked into treatment cohorts. Reorder in both sun_abraham.py and staggered.py so normalization precedes treatment_groups derivation. Add results.groups assertion and all-never-treated ValueError test. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…as compat - Guard overall bootstrap stats in _run_bootstrap() when ATT is NaN, preventing _compute_bootstrap_pvalue from returning 1/(B+1) instead of NaN - Add test_no_post_effects_bootstrap_returns_nan for the bootstrap NaN path - Cast first_treat to float before assigning np.inf in tests for pandas compat Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: ✅ Looks good Executive Summary
Methodology Code Quality Performance
Maintainability Tech Debt Security Documentation/Tests |
Summary
fixest::sunab()Methodology references (required if estimator / math changes)
compute_p_value()) rather than t-distribution. This is asymptotically equivalent and standard for delta-method-aggregated quantities. R's fixest uses t-distribution at all levels, so aggregated p-values may differ slightly for small samples.Validation
tests/test_sun_abraham.py— 7 new methodology tests inTestSunAbrahamMethodologyclassSecurity / privacy
Generated with Claude Code