Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 52 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
- **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
- **Panel data support**: Two-way fixed effects estimator for panel designs
- **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), and Stacked DiD (Wing, Freedman & Hollingsworth 2024) estimators for heterogeneous treatment timing
- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), Borusyak-Jaravel-Spiess (2024) imputation, Two-Stage DiD (Gardner 2022), Stacked DiD (Wing, Freedman & Hollingsworth 2024), and Efficient DiD (Chen, Sant'Anna & Xie 2025) estimators for heterogeneous treatment timing
- **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
- **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
- **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
Expand Down Expand Up @@ -125,6 +125,7 @@ We provide Jupyter notebook tutorials in `docs/tutorials/`:
| `11_imputation_did.ipynb` | Imputation DiD (Borusyak et al. 2024), pre-trend test, efficiency comparison |
| `12_two_stage_did.ipynb` | Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects |
| `13_stacked_did.ipynb` | Stacked DiD (Wing et al. 2024), Q-weights, sub-experiment inspection, trimming, clean control definitions |
| `15_efficient_did.ipynb` | Efficient DiD (Chen et al. 2025), optimal weighting, PT-All vs PT-Post, efficiency gains, bootstrap inference |

## Data Preparation

Expand Down Expand Up @@ -1071,6 +1072,56 @@ results = stacked_did(
)
```

### Efficient DiD (Chen, Sant'Anna & Xie 2025)

Efficient DiD achieves the semiparametric efficiency bound for ATT estimation in staggered adoption designs. It optimally weights across all valid comparison groups and baselines via the inverse covariance matrix Omega*, producing tighter confidence intervals than standard estimators like Callaway-Sant'Anna when the stronger PT-All assumption holds.

```python
from diff_diff import EfficientDiD, generate_staggered_data

# Generate sample data
data = generate_staggered_data(n_units=300, n_periods=10,
cohort_periods=[4, 6, 8], seed=42)

# Fit with PT-All (overidentified, tighter SEs)
edid = EfficientDiD(pt_assumption="all")
results = edid.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='all')
results.print_summary()

# PT-Post mode (matches CS for post-treatment effects)
edid_post = EfficientDiD(pt_assumption="post")
results_post = edid_post.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
```

**Parameters:**

```python
EfficientDiD(
pt_assumption='all', # 'all' (overidentified) or 'post' (matches CS post-treatment ATT)
alpha=0.05, # Significance level
n_bootstrap=0, # Bootstrap iterations (0 = analytical only)
bootstrap_weights='rademacher', # 'rademacher', 'mammen', or 'webb'
seed=None, # Random seed
anticipation=0, # Anticipation periods
)
```

> **Note:** Phase 1 supports the no-covariates path only. Use CallawaySantAnna with
> `estimation_method='dr'` if you need covariate adjustment.

**When to use Efficient DiD vs Callaway-Sant'Anna:**

| Aspect | Efficient DiD | Callaway-Sant'Anna |
|--------|--------------|-------------------|
| Approach | Optimal EIF-based weighting | Separate 2x2 DiD aggregation |
| PT assumption | PT-All (stronger) or PT-Post | Conditional PT |
| Efficiency | Achieves semiparametric bound | Not efficient |
| Covariates | Not yet (Phase 2) | Supported (OR, IPW, DR) |
| When to choose | Maximum efficiency, PT-All credible | Covariates needed, weaker PT |

### Triple Difference (DDD)

Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
Expand Down
150 changes: 150 additions & 0 deletions docs/api/efficient_did.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
Efficient Difference-in-Differences
====================================

Semiparametrically efficient ATT estimator for staggered adoption designs
from Chen, Sant'Anna & Xie (2025).

This module implements the efficiency-bound-attaining estimator that:

1. **Achieves the semiparametric efficiency bound** for ATT(g,t) estimation
2. **Optimally weights** across comparison groups and baselines via the
inverse covariance matrix Ω*
3. **Supports two PT assumptions**: PT-All (overidentified, tighter SEs) and
PT-Post (just-identified, matches CS for post-treatment effects)
4. **Uses EIF-based inference** for analytical standard errors and multiplier
bootstrap

.. note::

Phase 1 supports the **no-covariates** path only. The with-covariates
path (Phase 2) will be added in a future version.

**When to use EfficientDiD:**

- Staggered adoption design where you want **maximum efficiency**
- You believe parallel trends holds across all pre-treatment periods (PT-All)
- You want tighter confidence intervals than Callaway-Sant'Anna
- You need a formal efficiency benchmark for comparing estimators

**Reference:** Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025). Efficient
Difference-in-Differences and Event Study Estimators.

.. module:: diff_diff.efficient_did

EfficientDiD
-------------

Main estimator class for Efficient Difference-in-Differences.

.. autoclass:: diff_diff.EfficientDiD
:members:
:undoc-members:
:show-inheritance:
:inherited-members:

.. rubric:: Methods

.. autosummary::

~EfficientDiD.fit
~EfficientDiD.get_params
~EfficientDiD.set_params

EfficientDiDResults
-------------------

Results container for Efficient DiD estimation.

.. autoclass:: diff_diff.efficient_did_results.EfficientDiDResults
:members:
:undoc-members:
:show-inheritance:

.. rubric:: Methods

.. autosummary::

~EfficientDiDResults.summary
~EfficientDiDResults.print_summary
~EfficientDiDResults.to_dataframe

EDiDBootstrapResults
--------------------

Bootstrap inference results for Efficient DiD.

.. autoclass:: diff_diff.efficient_did_bootstrap.EDiDBootstrapResults
:members:
:undoc-members:
:show-inheritance:

Example Usage
-------------

Basic usage::

from diff_diff import EfficientDiD, generate_staggered_data

data = generate_staggered_data(n_units=300, n_periods=10,
cohort_periods=[4, 6, 8], seed=42)

edid = EfficientDiD(pt_assumption="all")
results = edid.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='all')
results.print_summary()

PT-Post mode (matches CS for post-treatment ATT)::

edid_post = EfficientDiD(pt_assumption="post")
results_post = edid_post.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='all')
print(f"PT-All ATT: {results.overall_att:.4f} (SE={results.overall_se:.4f})")
print(f"PT-Post ATT: {results_post.overall_att:.4f} (SE={results_post.overall_se:.4f})")

Bootstrap inference::

edid_boot = EfficientDiD(pt_assumption="all", n_bootstrap=999, seed=42)
results_boot = edid_boot.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='all')
print(f"Bootstrap SE: {results_boot.overall_se:.4f}")
print(f"Bootstrap CI: [{results_boot.overall_conf_int[0]:.4f}, "
f"{results_boot.overall_conf_int[1]:.4f}]")

Comparison with Other Staggered Estimators
------------------------------------------

.. list-table::
:header-rows: 1
:widths: 20 27 27 26

* - Feature
- EfficientDiD
- CallawaySantAnna
- ImputationDiD
* - Approach
- Optimal EIF-based weighting
- Separate 2x2 DiD aggregation
- Impute Y(0) via FE model
* - PT assumption
- PT-All (stronger) or PT-Post
- Conditional PT
- Strict exogeneity
* - Efficiency
- Achieves semiparametric bound
- Not efficient
- Efficient under homogeneity
* - Covariates
- Not yet (Phase 2)
- Supported (OR, IPW, DR)
- Supported
* - Bootstrap
- Multiplier bootstrap (EIF)
- Multiplier bootstrap
- Multiplier bootstrap
* - PT-Post equivalence
- Matches CS post-treatment ATT(g,t)
- Baseline
- Different framework
4 changes: 4 additions & 0 deletions docs/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Core estimator classes for DiD analysis:
diff_diff.TripleDifference
diff_diff.TROP
diff_diff.ContinuousDiD
diff_diff.EfficientDiD

Results Classes
---------------
Expand All @@ -49,6 +50,8 @@ Result containers returned by estimators:
diff_diff.trop.TROPResults
diff_diff.ContinuousDiDResults
diff_diff.DoseResponseCurve
diff_diff.EfficientDiDResults
diff_diff.EDiDBootstrapResults

Visualization
-------------
Expand Down Expand Up @@ -195,6 +198,7 @@ Detailed documentation by module:
triple_diff
trop
continuous_did
efficient_did
results
visualization
diagnostics
Expand Down
32 changes: 31 additions & 1 deletion docs/choosing_estimator.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ Start here and follow the questions:
1. **Is treatment staggered?** (Different units treated at different times)

- **No** → Go to question 2
- **Yes** → Use :class:`~diff_diff.CallawaySantAnna`
- **Yes** → Use :class:`~diff_diff.CallawaySantAnna` (or :class:`~diff_diff.EfficientDiD` for tighter SEs under PT-All)

2. **Do you have panel data?** (Multiple observations per unit over time)

Expand Down Expand Up @@ -63,6 +63,10 @@ Quick Reference
- Few treated units, many controls
- Synthetic parallel trends
- ATT with unit/time weights
* - ``EfficientDiD``
- Staggered adoption with optimal efficiency
- PT-All (overidentified) or PT-Post
- Group-time ATT(g,t), aggregations
* - ``ContinuousDiD``
- Continuous dose / treatment intensity
- Strong Parallel Trends (SPT) for dose-response; PT for binarized ATT
Expand Down Expand Up @@ -214,6 +218,32 @@ Use :class:`~diff_diff.ContinuousDiD` when:
print(f"Overall ATT: {results.overall_att:.3f}")
att_curve = results.dose_response_att.to_dataframe()

Efficient DiD
~~~~~~~~~~~~~

Use :class:`~diff_diff.EfficientDiD` when:

- You have staggered adoption and want **maximum statistical efficiency**
- You believe parallel trends holds across all pre-treatment periods (PT-All)
- You want tighter confidence intervals than Callaway-Sant'Anna
- You need a formal efficiency benchmark for comparing estimators

.. note::

Phase 1 supports the **no-covariates** path only. If you need covariate
adjustment, use :class:`~diff_diff.CallawaySantAnna` with ``estimation_method='dr'``
or :class:`~diff_diff.ImputationDiD`.

.. code-block:: python

from diff_diff import EfficientDiD

edid = EfficientDiD(pt_assumption="all") # or "post" for post-treatment CS match
results = edid.fit(data, outcome='y', unit='unit_id',
time='period', first_treat='first_treat',
aggregate='all')
results.print_summary()

Common Pitfalls
---------------

Expand Down
Loading