Fix tutorial notebook validation errors and add pre_periods parameter by igerber · Pull Request #74 · igerber/diff-diff

igerber · 2026-01-18T19:46:02Z

Tutorial notebook fixes:

02_staggered_did: Fix CallawaySantAnna API usage (first_treat param, aggregate attributes instead of method)
03_synthetic_did: Change n_bootstrap=0 to variance_method="placebo"
04_parallel_trends: Fix placebo test API (parameter names, required args)
07_pretrends_power: Add pre_periods parameter for event study workflow
10_trop: Reduce computational load for faster validation

Code fixes:

staggered.py: Standardize first_treat column name internally to avoid hardcoded column reference bug
pretrends.py: Add pre_periods parameter to fit(), power_at(), power_curve(), and sensitivity_to_honest_did() methods to support event studies where all periods are estimated as post_periods
pretrends.py: Add power_at() method to PreTrendsPowerResults class
pretrends.py: Update convenience functions with pre_periods parameter

Other:

Move TROP paper to papers/ directory
Add .claude/settings.local.json to .gitignore
Clear all notebook outputs

Tutorial notebook fixes: - 02_staggered_did: Fix CallawaySantAnna API usage (first_treat param, aggregate attributes instead of method) - 03_synthetic_did: Change n_bootstrap=0 to variance_method="placebo" - 04_parallel_trends: Fix placebo test API (parameter names, required args) - 07_pretrends_power: Add pre_periods parameter for event study workflow - 10_trop: Reduce computational load for faster validation Code fixes: - staggered.py: Standardize first_treat column name internally to avoid hardcoded column reference bug - pretrends.py: Add pre_periods parameter to fit(), power_at(), power_curve(), and sensitivity_to_honest_did() methods to support event studies where all periods are estimated as post_periods - pretrends.py: Add power_at() method to PreTrendsPowerResults class - pretrends.py: Update convenience functions with pre_periods parameter Other: - Move TROP paper to papers/ directory - Add .claude/settings.local.json to .gitignore - Clear all notebook outputs Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-18T20:22:10Z

Code Review: PR #74

Executive Summary

This PR makes two types of changes: (1) tutorial notebook fixes to align with current API, and (2) code fixes/enhancements to staggered.py and pretrends.py. The changes are methodologically sound but have significant test coverage gaps.

Part 1: Methodology Review

1.1 `pretrends.py` - `pre_periods` Parameter Addition ✅

Change: Added explicit pre_periods parameter to fit(), power_at(), power_curve(), and sensitivity_to_honest_did().

Assessment: Correct and necessary

The Roth (2022) pre-trends power framework operates on pre-period coefficients from an event study. When users estimate all periods (including pre-periods) as post_periods, they need a way to tell the power analysis which coefficients are actually pre-treatment. The implementation correctly:

Accepts explicit pre_periods list
Falls back to results.pre_periods when not provided (backward compatible)
Filters to estimated coefficients (excluding reference period)
Extracts the correct variance-covariance submatrix

1.2 `PreTrendsPowerResults.power_at()` Method ✅

Assessment: Correct implementation

The non-centrality parameter computation follows Roth (2022): λ = M² × (w' V⁻¹ w) and power uses non-central chi-squared correctly.

1.3 `staggered.py` - Column Name Standardization ✅

Change: Added df['first_treat'] = df[first_treat] to copy user-specified column to standardized internal name.

Assessment: Bug fix - correct

This fixes a bug where internal methods at lines 1638 and 1693 hardcoded df['first_treat'] instead of using the first_treat parameter.

Part 2: Issues Found

🔴 Critical: Missing Tests

The new functionality has no test coverage:

PreTrendsPowerResults.power_at() - No tests exist
pre_periods parameter - Not tested in any method
staggered.py bug fix - No regression test for non-standard column names

Recommendation: Add tests for:

def test_results_power_at():
    """Test power_at method on PreTrendsPowerResults."""
    
def test_fit_with_explicit_pre_periods():
    """Test that pre_periods parameter overrides results.pre_periods."""

def test_custom_first_treat_column_name():
    """Test CallawaySantAnna with non-standard first_treat column name."""

🟡 Medium: Inconsistent Parameter Naming

In compute_mdv(), the parameter was renamed from target_power to power:

# Old
def compute_mdv(..., target_power: float = 0.80, ...):

# New  
def compute_mdv(..., power: float = 0.80, ...):

This is a breaking change for anyone using keyword arguments. The compute_pretrends_power() function still uses target_power.

Recommendation: Either keep target_power for backward compatibility or update both functions consistently.

🟡 Medium: Weight Computation Inconsistency

PreTrendsPowerResults.power_at() duplicates weight computation logic but differs from PreTrendsPower._get_violation_weights():

# In PreTrendsPowerResults.power_at() - uses [1, 2, ..., n]
weights = np.arange(1, n_pre + 1).astype(float)

# In PreTrendsPower._get_violation_weights() - uses [n-1, n-2, ..., 0]
weights = np.arange(-n_pre + 1, 1, dtype=float)
weights = -weights

These produce different normalized weights and will give different power values for the same M.

Recommendation: Store the computed weights in the results object or factor out the weight computation to ensure consistency.

🟢 Minor: Unused Variable

In power_curve():

_, ses, vcov, n_pre = self._extract_pre_period_params(results, pre_periods)

ses is unused.

Part 3: Performance & Maintainability

Category	Rating	Notes
Performance	✅ Good	No concerns - O(n) column copy, reuses stored vcov
Memory	✅ Good	No new allocations
Maintainability	⚠️ Medium	Test debt, weight duplication

Recommendations

Must Fix Before Merge

Add tests for new functionality (see above)
Fix weight computation inconsistency between PreTrendsPowerResults.power_at() and PreTrendsPower._get_violation_weights()

Should Fix

Consistent parameter naming: keep target_power or rename both to power

Nice to Have

Store violation weights in results object
Add warning when power_at() called with violation_type='custom'

Summary

Category	Rating
Methodology	✅ Correct
Code Quality	⚠️ Needs work
Performance	✅ Good
Overall	Needs Revision

The methodology is sound but please add tests and fix the weight inconsistency before merging.

Fixes: - Fix weight computation in PreTrendsPowerResults.power_at() to match _get_violation_weights() logic (linear weights should be [n-1, n-2, ..., 0]) - Fix compute_mdv() parameter name from 'power' back to 'target_power' for consistency with compute_pretrends_power() - Update notebook cell-28 to use target_power instead of power Tests added: - TestPreTrendsPowerResultsPowerAt: 6 tests for power_at() method - TestPrePeriodsParameter: 6 tests for pre_periods parameter - TestCallawaySantAnnaNonStandardColumnNames: 10 tests for non-standard column names in CallawaySantAnna Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-18T20:31:33Z

Re-review: Commit `336e246` ✅

The follow-up commit successfully addresses all critical and medium-priority issues from my initial review.

Issues Resolved

Issue	Status	Details
🔴 Missing tests for `power_at()`	✅ Fixed	6 tests added in `TestPreTrendsPowerResultsPowerAt`
🔴 Missing tests for `pre_periods`	✅ Fixed	6 tests added in `TestPrePeriodsParameter`
🔴 Missing tests for staggered bug fix	✅ Fixed	9 tests added in `TestCallawaySantAnnaNonStandardColumnNames`
🟡 Weight inconsistency	✅ Fixed	`power_at()` now matches `_get_violation_weights()`
🟡 Parameter naming	✅ Fixed	Reverted to `target_power` in `compute_mdv()`

Test Results

All 21 new tests pass:

tests/test_pretrends.py - 12 passed ✅
tests/test_staggered.py - 9 passed ✅

Code Quality Assessment

Tests are thorough and well-designed:

TestPreTrendsPowerResultsPowerAt: Tests basic functionality, edge case (M=0), consistency with fit(), and all weight types (linear, constant, last_period)
TestPrePeriodsParameter: Tests explicit pre_periods, override behavior, and integration with all methods (fit, power_at, power_curve, sensitivity_to_honest_did, convenience functions)
TestCallawaySantAnnaNonStandardColumnNames: Excellent coverage including:
- Basic non-standard names
- All columns renamed
- With bootstrap, event study, covariates, not_yet_treated
- Key regression test: verifies identical results between standard and custom names
- Edge cases: spaces in names, special characters

Minor Note

There are some RuntimeWarnings (divide by zero, overflow in matmul) in the bootstrap code, but these are pre-existing issues not introduced by this PR. The tests still pass correctly.

Final Assessment

Category	Rating
Methodology	✅ Correct
Code Quality	✅ Good
Test Coverage	✅ Comprehensive
Performance	✅ Good
Overall	Approved ✅

Ready to merge. Nice work addressing the feedback!

Add TODO item for RuntimeWarnings that occur during influence function aggregation in staggered.py. These warnings (divide by zero, overflow, invalid value in matmul) occur with small sample sizes or edge cases but don't affect result correctness. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber merged commit 894955e into main Jan 18, 2026
4 checks passed

igerber deleted the fix/tutorial-notebook-validation branch January 18, 2026 21:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix tutorial notebook validation errors and add pre_periods parameter#74

Fix tutorial notebook validation errors and add pre_periods parameter#74
igerber merged 3 commits intomainfrom
fix/tutorial-notebook-validation

igerber commented Jan 18, 2026

Uh oh!

igerber commented Jan 18, 2026

Uh oh!

igerber commented Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jan 18, 2026

Uh oh!

igerber commented Jan 18, 2026

Code Review: PR #74

Executive Summary

Part 1: Methodology Review

1.1 pretrends.py - pre_periods Parameter Addition ✅

1.2 PreTrendsPowerResults.power_at() Method ✅

1.3 staggered.py - Column Name Standardization ✅

Part 2: Issues Found

🔴 Critical: Missing Tests

🟡 Medium: Inconsistent Parameter Naming

🟡 Medium: Weight Computation Inconsistency

🟢 Minor: Unused Variable

Part 3: Performance & Maintainability

Recommendations

Must Fix Before Merge

Should Fix

Nice to Have

Summary

Uh oh!

igerber commented Jan 18, 2026

Re-review: Commit 336e246 ✅

Issues Resolved

Test Results

Code Quality Assessment

Minor Note

Final Assessment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1.1 `pretrends.py` - `pre_periods` Parameter Addition ✅

1.2 `PreTrendsPowerResults.power_at()` Method ✅

1.3 `staggered.py` - Column Name Standardization ✅

Re-review: Commit `336e246` ✅