Skip to content

Fix tutorial notebook validation errors and add pre_periods parameter#74

Merged
igerber merged 3 commits intomainfrom
fix/tutorial-notebook-validation
Jan 18, 2026
Merged

Fix tutorial notebook validation errors and add pre_periods parameter#74
igerber merged 3 commits intomainfrom
fix/tutorial-notebook-validation

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 18, 2026

Tutorial notebook fixes:

  • 02_staggered_did: Fix CallawaySantAnna API usage (first_treat param, aggregate attributes instead of method)
  • 03_synthetic_did: Change n_bootstrap=0 to variance_method="placebo"
  • 04_parallel_trends: Fix placebo test API (parameter names, required args)
  • 07_pretrends_power: Add pre_periods parameter for event study workflow
  • 10_trop: Reduce computational load for faster validation

Code fixes:

  • staggered.py: Standardize first_treat column name internally to avoid hardcoded column reference bug
  • pretrends.py: Add pre_periods parameter to fit(), power_at(), power_curve(), and sensitivity_to_honest_did() methods to support event studies where all periods are estimated as post_periods
  • pretrends.py: Add power_at() method to PreTrendsPowerResults class
  • pretrends.py: Update convenience functions with pre_periods parameter

Other:

  • Move TROP paper to papers/ directory
  • Add .claude/settings.local.json to .gitignore
  • Clear all notebook outputs

Tutorial notebook fixes:
- 02_staggered_did: Fix CallawaySantAnna API usage (first_treat param,
  aggregate attributes instead of method)
- 03_synthetic_did: Change n_bootstrap=0 to variance_method="placebo"
- 04_parallel_trends: Fix placebo test API (parameter names, required args)
- 07_pretrends_power: Add pre_periods parameter for event study workflow
- 10_trop: Reduce computational load for faster validation

Code fixes:
- staggered.py: Standardize first_treat column name internally to avoid
  hardcoded column reference bug
- pretrends.py: Add pre_periods parameter to fit(), power_at(), power_curve(),
  and sensitivity_to_honest_did() methods to support event studies where all
  periods are estimated as post_periods
- pretrends.py: Add power_at() method to PreTrendsPowerResults class
- pretrends.py: Update convenience functions with pre_periods parameter

Other:
- Move TROP paper to papers/ directory
- Add .claude/settings.local.json to .gitignore
- Clear all notebook outputs

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 18, 2026

Code Review: PR #74

Executive Summary

This PR makes two types of changes: (1) tutorial notebook fixes to align with current API, and (2) code fixes/enhancements to staggered.py and pretrends.py. The changes are methodologically sound but have significant test coverage gaps.


Part 1: Methodology Review

1.1 pretrends.py - pre_periods Parameter Addition ✅

Change: Added explicit pre_periods parameter to fit(), power_at(), power_curve(), and sensitivity_to_honest_did().

Assessment: Correct and necessary

The Roth (2022) pre-trends power framework operates on pre-period coefficients from an event study. When users estimate all periods (including pre-periods) as post_periods, they need a way to tell the power analysis which coefficients are actually pre-treatment. The implementation correctly:

  • Accepts explicit pre_periods list
  • Falls back to results.pre_periods when not provided (backward compatible)
  • Filters to estimated coefficients (excluding reference period)
  • Extracts the correct variance-covariance submatrix

1.2 PreTrendsPowerResults.power_at() Method ✅

Assessment: Correct implementation

The non-centrality parameter computation follows Roth (2022): λ = M² × (w' V⁻¹ w) and power uses non-central chi-squared correctly.

1.3 staggered.py - Column Name Standardization ✅

Change: Added df['first_treat'] = df[first_treat] to copy user-specified column to standardized internal name.

Assessment: Bug fix - correct

This fixes a bug where internal methods at lines 1638 and 1693 hardcoded df['first_treat'] instead of using the first_treat parameter.


Part 2: Issues Found

🔴 Critical: Missing Tests

The new functionality has no test coverage:

  1. PreTrendsPowerResults.power_at() - No tests exist
  2. pre_periods parameter - Not tested in any method
  3. staggered.py bug fix - No regression test for non-standard column names

Recommendation: Add tests for:

def test_results_power_at():
    """Test power_at method on PreTrendsPowerResults."""
    
def test_fit_with_explicit_pre_periods():
    """Test that pre_periods parameter overrides results.pre_periods."""

def test_custom_first_treat_column_name():
    """Test CallawaySantAnna with non-standard first_treat column name."""

🟡 Medium: Inconsistent Parameter Naming

In compute_mdv(), the parameter was renamed from target_power to power:

# Old
def compute_mdv(..., target_power: float = 0.80, ...):

# New  
def compute_mdv(..., power: float = 0.80, ...):

This is a breaking change for anyone using keyword arguments. The compute_pretrends_power() function still uses target_power.

Recommendation: Either keep target_power for backward compatibility or update both functions consistently.

🟡 Medium: Weight Computation Inconsistency

PreTrendsPowerResults.power_at() duplicates weight computation logic but differs from PreTrendsPower._get_violation_weights():

# In PreTrendsPowerResults.power_at() - uses [1, 2, ..., n]
weights = np.arange(1, n_pre + 1).astype(float)

# In PreTrendsPower._get_violation_weights() - uses [n-1, n-2, ..., 0]
weights = np.arange(-n_pre + 1, 1, dtype=float)
weights = -weights

These produce different normalized weights and will give different power values for the same M.

Recommendation: Store the computed weights in the results object or factor out the weight computation to ensure consistency.

🟢 Minor: Unused Variable

In power_curve():

_, ses, vcov, n_pre = self._extract_pre_period_params(results, pre_periods)

ses is unused.


Part 3: Performance & Maintainability

Category Rating Notes
Performance ✅ Good No concerns - O(n) column copy, reuses stored vcov
Memory ✅ Good No new allocations
Maintainability ⚠️ Medium Test debt, weight duplication

Recommendations

Must Fix Before Merge

  1. Add tests for new functionality (see above)
  2. Fix weight computation inconsistency between PreTrendsPowerResults.power_at() and PreTrendsPower._get_violation_weights()

Should Fix

  1. Consistent parameter naming: keep target_power or rename both to power

Nice to Have

  1. Store violation weights in results object
  2. Add warning when power_at() called with violation_type='custom'

Summary

Category Rating
Methodology ✅ Correct
Code Quality ⚠️ Needs work
Performance ✅ Good
Overall Needs Revision

The methodology is sound but please add tests and fix the weight inconsistency before merging.

Fixes:
- Fix weight computation in PreTrendsPowerResults.power_at() to match
  _get_violation_weights() logic (linear weights should be [n-1, n-2, ..., 0])
- Fix compute_mdv() parameter name from 'power' back to 'target_power'
  for consistency with compute_pretrends_power()
- Update notebook cell-28 to use target_power instead of power

Tests added:
- TestPreTrendsPowerResultsPowerAt: 6 tests for power_at() method
- TestPrePeriodsParameter: 6 tests for pre_periods parameter
- TestCallawaySantAnnaNonStandardColumnNames: 10 tests for non-standard
  column names in CallawaySantAnna

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Jan 18, 2026

Re-review: Commit 336e246

The follow-up commit successfully addresses all critical and medium-priority issues from my initial review.

Issues Resolved

Issue Status Details
🔴 Missing tests for power_at() ✅ Fixed 6 tests added in TestPreTrendsPowerResultsPowerAt
🔴 Missing tests for pre_periods ✅ Fixed 6 tests added in TestPrePeriodsParameter
🔴 Missing tests for staggered bug fix ✅ Fixed 9 tests added in TestCallawaySantAnnaNonStandardColumnNames
🟡 Weight inconsistency ✅ Fixed power_at() now matches _get_violation_weights()
🟡 Parameter naming ✅ Fixed Reverted to target_power in compute_mdv()

Test Results

All 21 new tests pass:

tests/test_pretrends.py - 12 passed ✅
tests/test_staggered.py - 9 passed ✅

Code Quality Assessment

Tests are thorough and well-designed:

  1. TestPreTrendsPowerResultsPowerAt: Tests basic functionality, edge case (M=0), consistency with fit(), and all weight types (linear, constant, last_period)

  2. TestPrePeriodsParameter: Tests explicit pre_periods, override behavior, and integration with all methods (fit, power_at, power_curve, sensitivity_to_honest_did, convenience functions)

  3. TestCallawaySantAnnaNonStandardColumnNames: Excellent coverage including:

    • Basic non-standard names
    • All columns renamed
    • With bootstrap, event study, covariates, not_yet_treated
    • Key regression test: verifies identical results between standard and custom names
    • Edge cases: spaces in names, special characters

Minor Note

There are some RuntimeWarnings (divide by zero, overflow in matmul) in the bootstrap code, but these are pre-existing issues not introduced by this PR. The tests still pass correctly.


Final Assessment

Category Rating
Methodology ✅ Correct
Code Quality ✅ Good
Test Coverage ✅ Comprehensive
Performance ✅ Good
Overall Approved

Ready to merge. Nice work addressing the feedback!

Add TODO item for RuntimeWarnings that occur during influence function
aggregation in staggered.py. These warnings (divide by zero, overflow,
invalid value in matmul) occur with small sample sizes or edge cases
but don't affect result correctness.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber igerber merged commit 894955e into main Jan 18, 2026
4 checks passed
@igerber igerber deleted the fix/tutorial-notebook-validation branch January 18, 2026 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant