Skip to content

Fix _parse_formula to support covariates before interaction term #186

Closed
SermetPekin wants to merge 1 commit intoigerber:mainfrom
SermetPekin:fix/parse-formula
Closed

Fix _parse_formula to support covariates before interaction term #186
SermetPekin wants to merge 1 commit intoigerber:mainfrom
SermetPekin:fix/parse-formula

Conversation

@SermetPekin
Copy link
Copy Markdown

Methodology references (required if estimator / math changes)

Method name(s): N/A — no estimator logic or math changed. This fixes a formula string parser, not the underlying estimation.
Paper / source link(s): N/A
Any intentional deviations from the source (and why): N/A
Validation

Tests added/updated: Added test_parse_formula.py with 6 tests covering:
Covariates before * (e.g. y ~ x1 + treated * post) — was broken, now fixed
Covariates on both sides of * (e.g. y ~ x1 + treated * post + x2) — was broken, now fixed
Multiple ~ gives clear error
Extra/no whitespace handling
Formula-vs-explicit-params equivalence (ATT and SE match to rtol=1e-10)
All 9 existing formula tests in test_methodology_did.py and test_estimators.py pass unchanged (no regressions)
Backtest / simulation / notebook evidence (if applicable): N/A
Security / privacy

Confirm no secrets/PII in this PR: Confirmed — changes are limited to formula parsing logic and test data generated with np.random.

@igerber
Copy link
Copy Markdown
Owner

igerber commented Mar 8, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 8, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 0df98e8d19bb484e870979242c78e38444f2eabf


Overall Assessment

  • ⚠️ Needs changes

Executive Summary

  • No ATT/SE/identification logic changed, and I found no source-material mismatch in the DiD estimator itself.
  • The PR does fix the reported parser bug for covariates before treated * post and adds useful coverage for multiple ~ and whitespace handling.
  • The new * parsing branch still mishandles a standard R-equivalent form, outcome ~ treated + post + treated * post (+ x), by treating treated and post as covariates.
  • That creates duplicate design-matrix columns and can overwrite coefficient entries in the results dict, even if ATT itself survives rank-deficiency handling.
  • Tests miss that equivalent-form regression case; no security or performance issues stood out in scope.

Methodology

  • Severity: P1 | Impact: Affected method is the DifferenceInDifferences formula interface only. The registry and docstring describe an R-style interface and cite fixest as the reference behavior, but the new * path appends every non-* term to covariates without excluding the main effects. For outcome ~ treated + post + treated * post + x1, treated and post get added twice to X, which makes the fit rank-deficient and can yield misleading coefficients output via duplicate-key overwrite. See docs/methodology/REGISTRY.md:73, docs/methodology/REGISTRY.md:77, diff_diff/estimators.py:46, diff_diff/estimators.py:481, diff_diff/estimators.py:257, diff_diff/estimators.py:323. | Concrete fix: After extracting treatment and time in the * branch, drop additive terms equal to those names and deduplicate the remaining covariates, mirroring the filtering already done in the : branch at diff_diff/estimators.py:532.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • Severity: P2 | Impact: The new tests cover x1 + treated * post and x1 + treated * post + x2, but they do not cover the standard equivalent spellings that still fail the new branch, especially outcome ~ treated + post + treated * post and outcome ~ treated + post + treated * post + x1. That leaves the remaining parser gap unguarded. See tests/test_parse_formula.py:39. | Concrete fix: Add regression tests for those formulas and assert equivalence to the explicit-parameter fit, plus no duplicate coefficient names / no rank-deficiency warning for redundant main effects.
  • Validation note: I reviewed this statically; the sandbox here does not have pytest or the project’s runtime deps installed, so I could not execute the test suite.

@igerber
Copy link
Copy Markdown
Owner

igerber commented Mar 8, 2026

In addition to the AI notes, I'd add that ideally, the tests are an existing test file like test_estimators.py instead of adding a new test file

@SermetPekin SermetPekin closed this Mar 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants