Optimize TROP estimator performance with vectorized operations by igerber · Pull Request #76 · igerber/diff-diff

igerber · 2026-01-18T21:52:07Z

Implement performance optimizations for the TROP estimator while preserving exact methodology from Athey, Imbens, Qu & Viviano (2025). Target: 5-10x speedup.

Key optimizations:

Vectorized data matrix construction using pandas pivot() instead of iterrows
Pre-computed structures (_precompute_structures) for distance matrices, time distances, and observation lists reused across LOOCV iterations
Vectorized unit distance computation with symmetric matrix storage
Optimized observation weights using pre-computed distance matrices
Vectorized alternating minimization with matrix operations instead of nested loops for alpha, beta, and L updates
LOOCV optimization using shared pre-computed structures

Added TestOptimizationEquivalence test class with 5 tests verifying:

Pre-computed structures consistency
Vectorized alternating minimization convergence
Weight computation correctness
Pivot vs iterrows equivalence
Reproducibility with seed

All 33 tests pass (28 original + 5 new equivalence tests).

Implement performance optimizations for the TROP estimator while preserving exact methodology from Athey, Imbens, Qu & Viviano (2025). Target: 5-10x speedup. Key optimizations: - Vectorized data matrix construction using pandas pivot() instead of iterrows - Pre-computed structures (_precompute_structures) for distance matrices, time distances, and observation lists reused across LOOCV iterations - Vectorized unit distance computation with symmetric matrix storage - Optimized observation weights using pre-computed distance matrices - Vectorized alternating minimization with matrix operations instead of nested loops for alpha, beta, and L updates - LOOCV optimization using shared pre-computed structures Added TestOptimizationEquivalence test class with 5 tests verifying: - Pre-computed structures consistency - Vectorized alternating minimization convergence - Weight computation correctness - Pivot vs iterrows equivalence - Reproducibility with seed All 33 tests pass (28 original + 5 new equivalence tests). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-18T21:58:17Z

PR #76 Review: Optimize TROP estimator performance with vectorized operations

Reviewer: Claude Code
Date: 2026-01-18
PR: #76
Branch: perf/trop-optimization → main
Files Changed: 2 (+567/-128 lines)

Summary

This PR optimizes the TROP (Triply Robust Panel) estimator implementation with vectorized operations and pre-computed data structures. The changes significantly improve computational efficiency while maintaining methodological correctness according to the Athey, Imbens, Qu & Viviano (2025) paper.

1. Methodology Validation

1.1 Paper Alignment ✅

The implementation correctly follows the methodology from Athey, Imbens, Qu & Viviano (2025) "Triply Robust Panel Estimators":

Equation 2 (Objective Function) - Page 7:
The implementation correctly minimizes the weighted objective with nuclear norm penalty ||L||_*. The _estimate_model() method (lines 993-1109) implements alternating minimization between:

Fixed effects α, β (weighted least squares)
Factor matrix L (soft-thresholding SVD for nuclear norm proximal operator)

Equation 3 (Distance-Based Weights) - Page 7:

# Time weights: θ_s = exp(-λ_time × |t - s|)
time_weights = np.exp(-lambda_time * self._precomputed["time_dist_matrix"][t, :])

# Unit weights: ω_j = exp(-λ_unit × dist(j, i))
unit_weights[j] = np.exp(-lambda_unit * dist)

The _compute_observation_weights() method (lines 829-930) correctly implements the exponential distance-based weights.

Equation 5 (LOOCV Criterion) - Page 8:
The _loocv_score_obs_specific() method (lines 1111-1205) correctly computes:

Q(λ) = Σ_{j,s: D_js=0} [τ̂_js^loocv(λ)]²

Algorithm 1 & 2 (Pages 9, 27):
The per-observation estimation loop (lines 731-755) correctly implements Algorithm 2, computing observation-specific weights and fitting separate models for each treated (i,t) observation.

1.2 Issues Identified

Minor: The unit distance computation in _compute_all_unit_distances() (lines 467-526) uses a nested loop over unit pairs. While this is a one-time computation, it could be further optimized using scipy.spatial.distance.cdist for larger panels. However, the current approach is correct and the O(n²) complexity is acceptable since it runs once per fit.

2. Code Quality

2.1 Strengths ✅

Clean abstractions: Pre-computed structures are encapsulated in _precompute_structures() (lines 400-465), improving readability and maintainability.

Vectorized operations: The alternating minimization in _estimate_model() uses numpy broadcasting effectively:

R_minus_beta = R - beta[:, np.newaxis]  # (n_periods, n_units)
weighted_R_minus_beta = W_masked * R_minus_beta
alpha_numerator = np.sum(weighted_R_minus_beta, axis=0)  # (n_units,)

Efficient data reshaping: Pivot-based matrix construction (lines 631-642) replaces O(n) iterrows loop:

Y = (
    data.pivot(index=time, columns=unit, values=outcome)
    .reindex(index=all_periods, columns=all_units)
    .values
)

Proper handling of edge cases: Division-by-zero protection with safe_unit_denom and safe_time_denom (lines 1062-1063).

2.2 Suggestions

Type hints completeness: The _precomputed dictionary could benefit from a TypedDict or dataclass definition for better IDE support and documentation.
Magic number: The LOOCV subsample size max_loocv = min(100, len(control_obs)) (line 1171) should be a configurable parameter or class constant with documentation explaining the choice.

3. Security

3.1 Assessment ✅

No security concerns identified:

No external data fetching or network calls
No file I/O beyond standard pandas operations
No shell command execution
Input validation present for required columns (lines 612-614)
No pickle/deserialization of untrusted data

4. Documentation

4.1 Strengths ✅

Comprehensive docstrings: The _precompute_structures() method has clear documentation explaining its purpose and the structures it creates.
Paper references: Appropriate citations to Athey et al. (2025) with equation numbers.
Test class documentation: TestOptimizationEquivalence includes clear docstrings explaining what each test verifies.

4.2 Suggestions

Algorithm reference: The vectorized alternating minimization section could benefit from inline comments referencing the specific update equations from the paper.

5. Performance

5.1 Optimization Analysis ✅

The PR achieves its performance goals through:

Optimization	Location	Complexity Change
Pivot-based matrix construction	`fit()` lines 631-642	O(n) → O(1)
Pre-computed distance matrices	`_precompute_structures()`	Computed once, reused across LOOCV
Vectorized alternating minimization	`_estimate_model()`	O(n×m) loops → numpy broadcasting
Pre-computed time distances	`time_dist_matrix`	Avoids recomputation per observation

5.2 Memory Considerations

The pre-computed structures add memory overhead:

unit_dist_matrix: O(n_units²)
time_dist_matrix: O(n_periods²)

This is acceptable for typical panel sizes but could be documented as a trade-off.

6. Maintainability

6.1 Strengths ✅

Single responsibility: _precompute_structures() is a focused method that handles all pre-computation.
Backward compatibility: The _compute_observation_weights() method includes fallback logic (lines 904-928) for when pre-computed structures are unavailable (bootstrap/jackknife cases).
Comprehensive test coverage: The new TestOptimizationEquivalence class (lines 965-1230) includes:
- test_precomputed_structures_consistency - Validates pre-computed matrices
- test_vectorized_alternating_minimization - Tests FE recovery
- test_vectorized_weights_computation - Validates weight formula
- test_pivot_vs_iterrows_equivalence - Ensures reshaping equivalence
- test_reproducibility_with_seed - Tests determinism

6.2 Suggestions

Consider extracting constants:

DEFAULT_LOOCV_MAX_SAMPLES = 100
CONVERGENCE_TOL_SVD = 1e-10

7. Test Coverage

7.1 Added Tests ✅

The PR adds 5 new tests in TestOptimizationEquivalence:

test_precomputed_structures_consistency - Verifies time/unit distance matrices are symmetric, diagonal-zero, and correctly computed.
test_vectorized_alternating_minimization - Tests that vectorized FE estimation recovers true effects (correlation > 0.95).
test_vectorized_weights_computation - Validates that weights follow Equation 3: exp(-λ × distance).
test_pivot_vs_iterrows_equivalence - Ensures pivot-based reshaping produces identical matrices to iterrows.
test_reproducibility_with_seed - Confirms deterministic results with same seed.

7.2 Suggestions

Consider adding a performance benchmark test that verifies the optimization provides speedup on larger datasets.

8. Verdict

Recommendation: APPROVE ✅

This PR successfully optimizes the TROP estimator while maintaining methodological correctness. The implementation:

✅ Correctly implements Athey et al. (2025) methodology
✅ Uses vectorized operations for significant speedup
✅ Pre-computes reusable structures to avoid redundant computation
✅ Includes comprehensive equivalence tests
✅ Maintains backward compatibility
✅ Has proper documentation and docstrings

Minor Suggestions (Non-blocking)

Extract magic number 100 for LOOCV samples to a configurable parameter
Consider TypedDict for _precomputed structure
Add performance benchmark test

References

Athey, S., Imbens, G. W., Qu, Z., & Viviano, D. (2025). Triply Robust Panel Estimators. Working Paper. https://arxiv.org/abs/2508.21536
- Equation 2 (page 7): TROP objective function
- Equation 3 (page 7): Distance-based weights
- Equation 5 (page 8): LOOCV criterion
- Algorithm 1 & 2 (pages 9, 27): Estimation procedures

…ct constants Based on PR #76 code review recommendations: 1. **Issue 1.2 - Vectorize unit distance computation**: Replace nested Python loop in _compute_all_unit_distances() with fully vectorized numpy operations using broadcasting. Eliminates O(n²) Python loop in favor of optimized numpy/BLAS operations. 2. **Add TypedDict for _precomputed structure**: Add _PrecomputedStructures TypedDict for better IDE support and documentation of the pre-computed data structures. 3. **Extract magic numbers to class constants**: - DEFAULT_LOOCV_MAX_SAMPLES = 100 (configurable via max_loocv_samples param) - CONVERGENCE_TOL_SVD = 1e-10 (SVD truncation tolerance) 4. **Add max_loocv_samples parameter**: Make LOOCV subsample size configurable instead of hardcoded 100. Updated __init__, get_params, and docstrings. 5. **Add algorithm reference comments**: Enhanced alternating minimization documentation with paper equation references (Equation 2, Algorithm 1). All 33 tests pass. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber merged commit b46f34a into main Jan 18, 2026
4 checks passed

igerber deleted the perf/trop-optimization branch January 18, 2026 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize TROP estimator performance with vectorized operations#76

Optimize TROP estimator performance with vectorized operations#76
igerber merged 2 commits intomainfrom
perf/trop-optimization

igerber commented Jan 18, 2026

Uh oh!

igerber commented Jan 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jan 18, 2026

Uh oh!

igerber commented Jan 18, 2026

PR #76 Review: Optimize TROP estimator performance with vectorized operations

Summary

1. Methodology Validation

1.1 Paper Alignment ✅

1.2 Issues Identified

2. Code Quality

2.1 Strengths ✅

2.2 Suggestions

3. Security

3.1 Assessment ✅

4. Documentation

4.1 Strengths ✅

4.2 Suggestions

5. Performance

5.1 Optimization Analysis ✅

5.2 Memory Considerations

6. Maintainability

6.1 Strengths ✅

6.2 Suggestions

7. Test Coverage

7.1 Added Tests ✅

7.2 Suggestions

8. Verdict

Recommendation: APPROVE ✅

Minor Suggestions (Non-blocking)

References

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant