Conversation
Create tests/test_methodology_callaway.py with 46 tests covering: - Phase 1: Equation verification (hand-calculated ATT formula match) - Phase 2: R benchmark comparison (did::att_gt() alignment) - Phase 3: Edge case tests (all REGISTRY.md documented cases) - Phase 4: SE formula verification (analytical vs bootstrap) Update METHODOLOGY_REVIEW.md to mark CallawaySantAnna as Complete with detailed verification results and documented deviations from R package. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
- Fix Rust Webb bootstrap weights to match NumPy implementation: - Correct values: ±√(3/2), ±1, ±√(1/2) (was using wrong values) - Correct probabilities: [1,2,3,3,2,1]/12 (was uniform) - Add 3 Rust unit tests for Webb weight verification - Both backends now produce variance ≈ 0.833 - Add lazy R availability fixture to avoid import-time latency: - New tests/conftest.py with session-scoped r_available fixture - Support DIFF_DIFF_R=skip environment variable - Test collection now completes in <1s (was ~2s with subprocess) - Improve test performance: - Add @pytest.mark.slow marker for thorough bootstrap tests - Reduce bootstrap iterations from 199 to 99 where sufficient - Add slow marker definition to pyproject.toml - Documentation updates: - METHODOLOGY_REVIEW.md: Correct Webb variance to 0.833 - TODO.md: Log 7 pre-existing NaN handling issues - CLAUDE.md: Document Rust test troubleshooting (PyO3 linking) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Open Questions
Overall Assessment
Executive Summary
Tests not run (review only). |
…ements - Fix Webb weights to use equal probabilities (1/6 each) matching R's did package - Python: removed weighted probs from staggered_bootstrap.py - Rust: changed to uniform selection in bootstrap.rs - Now Var(w) = 1.0 instead of 0.833, fixing ~8.5% SE underestimation - Add suppressMessages() to R benchmark tests to prevent JSON parse errors - Exclude slow tests by default via pytest addopts (-m 'not slow') - Update METHODOLOGY_REVIEW.md to reflect Webb alignment with R - Update test expectations for Webb variance (1.0 instead of 0.833) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…coverage - Add detailed Webb distribution documentation to REGISTRY.md with citation - Table of all bootstrap weight distributions (Rademacher, Mammen, Webb) - Explicit values, probabilities, and properties - Reference to Webb (2023) working paper - Fix R test path handling for cross-platform compatibility - Convert backslashes to forward slashes before embedding in R script - Use normalizePath() in R for proper path resolution - Document how to run slow tests in pyproject.toml - Add comment explaining pytest -m slow and pytest -m '' options - Update marker description for clarity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…mparison tests - Fix Webb weights in utils.py to use equal probabilities (1/6 each), matching staggered_bootstrap.py, Rust backend, and R's did package. This gives Var(w)=1.0 for consistency across all weight distributions. - Add MPDTA-based strict R comparison tests (TestMPDTARComparison): - Export R's mpdta data to ensure identical input for Python and R - Test overall ATT, SE, and group-time effects with <1% tolerance - Validates methodology alignment with R's did::att_gt() Note: pyproject.toml, test R suppressMessages(), and METHODOLOGY_REVIEW.md were already correct from previous commits. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Address PR #105 round 4 feedback: Add verification note referencing fwildclusterboot's C++ source code as evidence that our Webb weights implementation (±√1.5, ±1, ±√0.5 with equal probabilities) matches established R packages. Also clarify why some documentation shows simplified values (±1.5, ±1, ±0.5) vs the actual sqrt values used in implementations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment:
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Remove `-m 'not slow'` filter from pytest addopts so slow tests (like the analytical-vs-bootstrap SE convergence test) run by default. Developers can still use `pytest -m 'not slow'` for faster local runs. Addresses PR #105 feedback about CI test coverage. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add fwildclusterboot verification to Webb weights section in REGISTRY.md (authoritative methodology documentation, not just METHODOLOGY_REVIEW.md) - Update R dependency check in conftest.py to also verify jsonlite package (used by methodology tests for JSON output from R) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall assessment: Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
Methodology references (required if estimator / math changes)
Validation
tests/test_methodology_callaway.py(new - 46 tests)Security / privacy
Generated with Claude Code