Fix _parse_formula to support covariates before interaction term by SermetPekin · Pull Request #186 · igerber/diff-diff

SermetPekin · 2026-02-23T20:43:30Z

Methodology references (required if estimator / math changes)

Method name(s): N/A — no estimator logic or math changed. This fixes a formula string parser, not the underlying estimation.
Paper / source link(s): N/A
Any intentional deviations from the source (and why): N/A
Validation

Tests added/updated: Added test_parse_formula.py with 6 tests covering:
Covariates before * (e.g. y ~ x1 + treated * post) — was broken, now fixed
Covariates on both sides of * (e.g. y ~ x1 + treated * post + x2) — was broken, now fixed
Multiple ~ gives clear error
Extra/no whitespace handling
Formula-vs-explicit-params equivalence (ATT and SE match to rtol=1e-10)
All 9 existing formula tests in test_methodology_did.py and test_estimators.py pass unchanged (no regressions)
Backtest / simulation / notebook evidence (if applicable): N/A
Security / privacy

Confirm no secrets/PII in this PR: Confirmed — changes are limited to formula parsing logic and test data generated with np.random.

igerber · 2026-03-08T20:42:38Z

/ai-review

github-actions · 2026-03-08T20:47:34Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0df98e8d19bb484e870979242c78e38444f2eabf

Overall Assessment

⚠️ Needs changes

Executive Summary

No ATT/SE/identification logic changed, and I found no source-material mismatch in the DiD estimator itself.
The PR does fix the reported parser bug for covariates before treated * post and adds useful coverage for multiple ~ and whitespace handling.
The new * parsing branch still mishandles a standard R-equivalent form, outcome ~ treated + post + treated * post (+ x), by treating treated and post as covariates.
That creates duplicate design-matrix columns and can overwrite coefficient entries in the results dict, even if ATT itself survives rank-deficiency handling.
Tests miss that equivalent-form regression case; no security or performance issues stood out in scope.

Methodology

Severity: P1 | Impact: Affected method is the DifferenceInDifferences formula interface only. The registry and docstring describe an R-style interface and cite fixest as the reference behavior, but the new * path appends every non-* term to covariates without excluding the main effects. For outcome ~ treated + post + treated * post + x1, treated and post get added twice to X, which makes the fit rank-deficient and can yield misleading coefficients output via duplicate-key overwrite. See docs/methodology/REGISTRY.md:73, docs/methodology/REGISTRY.md:77, diff_diff/estimators.py:46, diff_diff/estimators.py:481, diff_diff/estimators.py:257, diff_diff/estimators.py:323. | Concrete fix: After extracting treatment and time in the * branch, drop additive terms equal to those names and deduplicate the remaining covariates, mirroring the filtering already done in the : branch at diff_diff/estimators.py:532.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

Severity: P2 | Impact: The new tests cover x1 + treated * post and x1 + treated * post + x2, but they do not cover the standard equivalent spellings that still fail the new branch, especially outcome ~ treated + post + treated * post and outcome ~ treated + post + treated * post + x1. That leaves the remaining parser gap unguarded. See tests/test_parse_formula.py:39. | Concrete fix: Add regression tests for those formulas and assert equivalence to the explicit-parameter fit, plus no duplicate coefficient names / no rank-deficiency warning for redundant main effects.
Validation note: I reviewed this statically; the sandbox here does not have pytest or the project’s runtime deps installed, so I could not execute the test suite.

igerber · 2026-03-08T20:49:35Z

In addition to the AI notes, I'd add that ideally, the tests are an existing test file like test_estimators.py instead of adding a new test file

fix _parse_formula method

0df98e8

SermetPekin closed this Mar 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix _parse_formula to support covariates before interaction term #186

Fix _parse_formula to support covariates before interaction term #186
SermetPekin wants to merge 1 commit intoigerber:mainfrom
SermetPekin:fix/parse-formula

SermetPekin commented Feb 23, 2026

Uh oh!

igerber commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

igerber commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SermetPekin commented Feb 23, 2026

Uh oh!

igerber commented Mar 8, 2026

Uh oh!

github-actions bot commented Mar 8, 2026

Uh oh!

igerber commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants