Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -360,6 +360,8 @@ See `docs/performance-plan.md` for full optimization details and `docs/benchmark
- `08_triple_diff.ipynb` - Triple Difference (DDD) estimation with proper covariate handling
- `09_real_world_examples.ipynb` - Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws)
- `10_trop.ipynb` - Triply Robust Panel (TROP) estimation with factor model adjustment
- `11_imputation_did.ipynb` - Imputation DiD (Borusyak et al. 2024), pre-trend test, efficiency comparison
- `12_two_stage_did.ipynb` - Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects

### Benchmarks

Expand Down
250 changes: 250 additions & 0 deletions docs/tutorials/12_two_stage_did.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,250 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Two-Stage DiD (Gardner 2022)\n",
"\n",
"This tutorial demonstrates the `TwoStageDiD` estimator, which implements the two-stage difference-in-differences method from Gardner (2022), \"Two-stage differences in differences\", with inference from Butts & Gardner (2022), \"did2s: Two-Stage Difference-in-Differences\".\n",
"\n",
"**When to use TwoStageDiD:**\n",
"- Staggered adoption settings where you want **GMM sandwich variance** that accounts for first-stage estimation uncertainty\n",
"- When you want **per-observation treatment effects** (`treatment_effects` DataFrame) for granular analysis\n",
"- As a **robustness check** alongside ImputationDiD: identical point estimates with different inference confirm results are not an artifact of variance estimator choice"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import warnings\n",
"warnings.filterwarnings('ignore')\n",
"\n",
"from diff_diff import (\n",
" TwoStageDiD, ImputationDiD, CallawaySantAnna,\n",
" generate_staggered_data, plot_event_study\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Usage\n",
"\n",
"The two-stage estimator follows a simple algorithm:\n",
"1. Estimate unit and time fixed effects using only **untreated observations** (never-treated + not-yet-treated periods)\n",
"2. Residualize **all** outcomes using those estimated FEs\n",
"3. Regress residualized outcomes on treatment indicators to obtain the ATT\n",
"\n",
"This avoids TWFE bias because the fixed effect model is estimated only on clean (untreated) data, preventing treated outcomes from contaminating the counterfactual."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Generate staggered adoption data with known treatment effect\n",
"data = generate_staggered_data(n_units=300, n_periods=10, treatment_effect=2.0, seed=42)\n",
"\n",
"# Fit the two-stage estimator\n",
"est = TwoStageDiD()\n",
"results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')\n",
"results.print_summary()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Event Study\n",
"\n",
"Event study aggregation estimates treatment effects at each relative time horizon, enabling visualization of dynamic effects and informal pre-trend assessment."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Fit with event study aggregation\n",
"est = TwoStageDiD()\n",
"results_es = est.fit(data, outcome='outcome', unit='unit', time='period',\n",
" first_treat='first_treat', aggregate='event_study')\n",
"\n",
"# Plot event study\n",
"plot_event_study(results_es, title='Two-Stage DiD Event Study')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# View event study effects as a table\n",
"results_es.to_dataframe(level='event_study')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Per-Observation Treatment Effects\n\nBoth `TwoStageDiD` and `ImputationDiD` provide a `treatment_effects` DataFrame containing one row per treated observation with:\n- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n- `rel_time`: relative time since treatment\n- `weight`: aggregation weight — `1/n_valid` for observations with finite `tau_hat`, `0` for NaN rows (e.g., rank-deficient cases)\n\nThis enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Per-observation treatment effects (available from the basic fit)\n",
"te = results.treatment_effects\n",
"print(f\"Shape: {te.shape}\")\n",
"print(f\"Columns: {list(te.columns)}\")\n",
"print()\n",
"te.head(10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Comparison with Other Estimators\n\nTwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n\nCallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. It uses analytical influence-function standard errors by default, with optional multiplier bootstrap when `n_bootstrap > 0`.\n\n*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Fit all three estimators on the same data\n",
"ts = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
" time='period', first_treat='first_treat')\n",
"imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n",
" time='period', first_treat='first_treat')\n",
"cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',\n",
" time='period', first_treat='first_treat')\n",
"\n",
"print(\"Estimator Comparison (True effect = 2.0)\")\n",
"print(\"=\" * 55)\n",
"print(f\"{'Estimator':<25} {'ATT':>8} {'SE':>8} {'CI Width':>10}\")\n",
"print(\"-\" * 55)\n",
"\n",
"for name, r in [(\"TwoStageDiD\", ts), (\"ImputationDiD\", imp), (\"CallawaySantAnna\", cs)]:\n",
" ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]\n",
" print(f\"{name:<25} {r.overall_att:>8.3f} {r.overall_se:>8.3f} {ci_width:>10.3f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Group Aggregation\n",
"\n",
"Group aggregation estimates average treatment effects by treatment cohort (groups defined by first treatment period)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Fit with group aggregation\n",
"results_grp = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
" time='period', first_treat='first_treat',\n",
" aggregate='group')\n",
"results_grp.to_dataframe(level='group')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Advanced Features\n",
"\n",
"### Anticipation\n",
"\n",
"If treatment effects begin before the official treatment date (e.g., firms change behavior in anticipation of a policy), use the `anticipation` parameter to shift the treatment onset back."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Compare ATT with and without anticipation\n",
"est_antic = TwoStageDiD(anticipation=1)\n",
"results_antic = est_antic.fit(data, outcome='outcome', unit='unit',\n",
" time='period', first_treat='first_treat')\n",
"print(f\"ATT (no anticipation): {results.overall_att:.3f}\")\n",
"print(f\"ATT (1-period anticipation): {results_antic.overall_att:.3f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### GMM Sandwich vs Conservative Variance\n",
"\n",
"The key methodological distinction between TwoStageDiD and ImputationDiD is the variance estimator:\n",
"\n",
"- **ImputationDiD's conservative variance** (Theorem 3) is valid under heterogeneous treatment effects but may produce wider confidence intervals than necessary\n",
"- **TwoStageDiD's GMM sandwich** accounts for first-stage estimation uncertainty via an influence function correction term\n",
"- In practice they usually agree closely; large divergence signals potential specification concerns\n",
"- Bootstrap inference is also available via `n_bootstrap=199`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Horizon-by-horizon SE comparison\n",
"ts_es = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
" time='period', first_treat='first_treat',\n",
" aggregate='event_study')\n",
"imp_es = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n",
" time='period', first_treat='first_treat',\n",
" aggregate='event_study')\n",
"\n",
"print(\"Horizon-by-Horizon Comparison: GMM Sandwich vs Conservative Variance\")\n",
"print(\"=\" * 70)\n",
"print(f\"{'Horizon':>8} {'Effect':>10} {'GMM SE':>10} {'Cons. SE':>10} {'Ratio':>8}\")\n",
"print(\"-\" * 70)\n",
"\n",
"for h in sorted(ts_es.event_study_effects.keys()):\n",
" ts_eff = ts_es.event_study_effects[h]\n",
" imp_eff = imp_es.event_study_effects[h]\n",
" if ts_eff.get('n_obs', 0) == 0:\n",
" print(f\"{h:>8} {'[ref]':>10} {'---':>10} {'---':>10} {'---':>8}\")\n",
" continue\n",
" effect = ts_eff['effect']\n",
" gmm_se = ts_eff['se']\n",
" cons_se = imp_eff['se']\n",
" ratio = gmm_se / cons_se if cons_se > 0 else np.nan\n",
" print(f\"{h:>8} {effect:>10.4f} {gmm_se:>10.4f} {cons_se:>10.4f} {ratio:>8.3f}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Summary\n\n| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n|---------|-------------|---------------|------------------|\n| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical influence function (optional bootstrap) |\n| **Per-obs effects** | Yes (`treatment_effects`) | Yes (`treatment_effects`) | No |\n| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n\n**References:**\n- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
}
],
"metadata": {
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 4
}