Align TROP with paper methodology (Equation 5, D matrix semantics)#110
Align TROP with paper methodology (Equation 5, D matrix semantics)#110
Conversation
- Fix CV scoring to use SUM instead of average per Equation 5 - Remove post_periods parameter (D matrix is sole input for treatment timing) - Add LOOCV warning when >10% of fits fail - Fix NaN handling for undefined inference fields (t_stat, p_value) - Update REGISTRY.md with D matrix semantics and requirements checklist - Add API verification tests for v2.1.8 changes Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Overall assessment: ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P0: Fix confidence interval NaN propagation - Move conf_int computation inside SE guard - When SE <= 0, conf_int is now (np.nan, np.nan) instead of finite values P1: Fix LOOCV warning consistency - Add warning before early return when ALL LOOCV fits fail (n_valid == 0) - Add warning after Rust LOOCV call when best_score is infinite - Ensures warnings are emitted regardless of backend used P2a: Update TROP documentation - Remove outdated post_periods parameter from TROP examples in README.md - Update docs/api/trop.rst examples - Update docs/tutorials/10_trop.ipynb (6 cells with TROP fit() calls) - TROP now infers treatment periods from the treatment indicator column P2b: Add test assertions - Update test_loocv_warning_on_many_failures with proper warning capture - Add test_nan_propagation_when_se_zero to verify P0 fix Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…rminism P1: Rust backend warning consistency - Modified Rust loocv_grid_search to return (score, n_valid, n_attempted) - Python now emits >10% failure warning after Rust call, matching Python path - Both backends now emit consistent LOOCV warnings P2: Fixed last missed post_periods documentation example - Updated TROP example in main README section (line 1147) P3: Added deterministic test for LOOCV warnings - New test_loocv_warning_deterministic_with_mock mocks _estimate_model - Forces 20% failure rate to trigger >10% warning threshold - Patches Rust backend to ensure Python path is tested Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Update test_loocv_grid_search_returns_valid_params to expect 6 values (score, n_valid, n_attempted) instead of 4 to match the updated Rust function signature. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Open Questions
Overall Assessment
Executive Summary
|
Per PR #110 feedback: Equation 5 specifies Q(λ) as a sum over ALL D==0 cells. The previous implementation skipped failed fits and only warned if >10% failed, which could bias λ selection toward unstable parameter combinations. Changes: - Python/Rust: Return inf immediately on first fit failure - Add warning mentioning Equation 5 compliance - Remove obsolete >10% failure threshold warning - Update methodology docs to reflect strict failure handling - Update test to verify new behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
- Implement paper's footnote 2 LOOCV approach: two-stage coarse-to-fine cycling (univariate searches then coordinate descent) instead of exhaustive grid search - Add D matrix absorbing-state validation (P1): check monotonic non-decreasing treatment indicator, raise ValueError with helpful message for event-style D (0→1→0) - Update Rust backend with matching cycling LOOCV implementation - Update REGISTRY.md with two-stage LOOCV procedure documentation - Add tests for D matrix validation and cycling search convergence Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary:
Methodology:
Code Quality:
Performance:
Maintainability:
Tech Debt:
Security:
Documentation/Tests:
|
…tics - Fix infinite score check: only assign best_lambda when score is finite (both Python and Rust code paths) - Fix λ_nn=∞ semantics: use 1e10 instead of 0.0 per paper Equations 2-3 - λ_time/λ_unit=∞ → 0.0 (uniform weights) is correct - λ_nn=∞ → 1e10 (large penalty → L≈0, factor model disabled) - Note: λ_nn=0 means NO regularization, opposite of "disabled" - Add regression tests for infinite score fallback to defaults - Document λ=∞ semantics in REGISTRY.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Add infinity conversion in fit() after unpacking best_lambda to ensure final estimation uses the same effective parameters that LOOCV evaluated. Changes: - Convert λ_time/λ_unit=∞→0 and λ_nn=∞→1e10 before final estimation - Add test for infinity-valued parameter grids - Clarify REGISTRY.md that conversion applies to both LOOCV and final estimation Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Open Questions
|
1. Rust LOOCV Parameter Conversion (rust/src/trop.rs): - Convert grid values before LOOCV evaluation to match Python behavior - λ_time=∞→0.0, λ_unit=∞→0.0, λ_nn=∞→1e10 - Ensures Rust and Python evaluate identical objective functions 2. Variance Estimation Consistency (diff_diff/trop.py): - Create effective_lambda with converted values after infinity conversion - Pass effective_lambda to both bootstrap and jackknife variance methods - Store original grid values in TROPResults for user visibility - Point estimation and SE now use identical parameters 3. Empty Control Set Handling (diff_diff/trop.py): - Add check for empty control_obs after subsampling in _loocv_score_obs_specific - Return np.inf with warning if control observations are empty - Prevents empty control sets from "winning" with score 0.0 Tests added for all three fixes. Documentation updated in TROPResults docstring, TROP.fit() docstring, and REGISTRY.md edge cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
1 similar comment
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Open Questions / Assumptions
|
Issue 1: Final LOOCV score infinity conversion - Convert inf values before calling loocv_score_for_params in Rust - Ensures final score uses same converted values that LOOCV evaluated - λ_time/λ_unit=∞ → 0.0, λ_nn=∞ → 1e10 Issue 2: Rust LOOCV failed observation metadata - Extend loocv_score_for_params to return Option<(usize, usize)> - Track first failed observation (t, i) for informative warnings - Python now includes coordinates in LOOCV failure warnings Issue 3: D matrix validation for unbalanced panels - Track missing values before fillna(0) with missing_mask - Only validate monotonicity between observed periods - Missing data no longer triggers false absorbing-state violations Tests: 4 new tests in TestPR110FeedbackRound8 class Docs: Updated REGISTRY.md with unbalanced panel support Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
Update test to unpack 7 values from loocv_grid_search (added first_failed_obs in round 8) and add assertion for the new return value. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
1 similar comment
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Update mock return value for _rust_loocv_grid_search to include the 7th value (first_failed_obs) that was added in round 8. The mock was returning 6 values but the actual Rust function returns 7. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…n_post_periods fix
P1: Fix absorbing-state validation to catch 1→0 violations across missing period gaps.
The old vectorized check only looked at adjacent periods, missing violations like
D[2]=1, missing [3,4], D[5]=0. Now checks each unit's observed D sequence.
P3: Fix n_post_periods to count periods with actual D=1 observations, matching the
docstring claim, rather than calendar periods from first treatment.
Also updates methodology registry documentation for both changes.
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Executive Summary
|
Summary
post_periodsparameter fromfit()andtrop()(D matrix is sole input for treatment timing)pre_periods/post_periodslists inTROPResultswithn_pre_periods/n_post_periodscountst_stat,p_valuenow returnnp.nanwhen SE is 0)Methodology references (required if estimator / math changes)
Q(λ) = Σ_{j,s: D_js=0} [τ̂_js^loocv(λ)]²post_periodsparameter since paper uses D matrix as sole treatment timing input (no separate post_periods concept)Validation
tests/test_trop.py(newTestAPIChangesV2_1_8class with 5 tests)Security / privacy
Generated with Claude Code