Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 16 additions & 4 deletions docs/tutorials/11_imputation_did.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"This tutorial demonstrates the `ImputationDiD` estimator, which implements the efficient imputation approach from Borusyak, Jaravel & Spiess (2024), \"Revisiting Event-Study Designs: Robust and Efficient Estimation\", *Review of Economic Studies*.\n",
"\n",
"**When to use ImputationDiD:**\n",
"- Staggered adoption settings where treatment effects may be **homogeneous** across cohorts and time \u2014 produces ~50% shorter CIs than Callaway-Sant'Anna\n",
"- Staggered adoption settings where treatment effects may be **homogeneous** across cohorts and time produces ~50% shorter CIs than Callaway-Sant'Anna\n",
"- When you want to use **all untreated observations** (never-treated + not-yet-treated) for maximum efficiency\n",
"- As a complement to Callaway-Sant'Anna or Sun-Abraham: if all three agree, results are robust; if they disagree, investigate heterogeneity"
]
Expand All @@ -27,7 +27,16 @@
"from diff_diff import (\n",
" ImputationDiD, CallawaySantAnna, SunAbraham,\n",
" generate_staggered_data, plot_event_study\n",
")"
")\n",
"\n",
"# For nicer plots (optional)\n",
"try:\n",
" import matplotlib.pyplot as plt\n",
" plt.style.use('seaborn-v0_8-whitegrid')\n",
" HAS_MATPLOTLIB = True\n",
"except ImportError:\n",
" HAS_MATPLOTLIB = False\n",
" print(\"matplotlib not installed - visualization examples will be skipped\")"
]
},
{
Expand Down Expand Up @@ -78,7 +87,10 @@
" first_treat='first_treat', aggregate='event_study')\n",
"\n",
"# Plot event study\n",
"plot_event_study(results_es, title='Imputation DiD Event Study')"
"if HAS_MATPLOTLIB:\n",
" plot_event_study(results_es, title='Imputation DiD Event Study')\n",
"else:\n",
" print(\"Install matplotlib to see visualizations: pip install matplotlib\")"
]
},
{
Expand Down Expand Up @@ -122,7 +134,7 @@
"source": [
"## Comparison with Other Estimators\n",
"\n",
"Under homogeneous treatment effects, ImputationDiD, Callaway-Sant'Anna, and Sun-Abraham should produce similar point estimates. The key difference is efficiency \u2014 ImputationDiD produces shorter confidence intervals."
"Under homogeneous treatment effects, ImputationDiD, Callaway-Sant'Anna, and Sun-Abraham should produce similar point estimates. The key difference is efficiency ImputationDiD produces shorter confidence intervals."
]
},
{
Expand Down
57 changes: 51 additions & 6 deletions docs/tutorials/12_two_stage_did.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,16 @@
"from diff_diff import (\n",
" TwoStageDiD, ImputationDiD, CallawaySantAnna,\n",
" generate_staggered_data, plot_event_study\n",
")"
")\n",
"\n",
"# For nicer plots (optional)\n",
"try:\n",
" import matplotlib.pyplot as plt\n",
" plt.style.use('seaborn-v0_8-whitegrid')\n",
" HAS_MATPLOTLIB = True\n",
"except ImportError:\n",
" HAS_MATPLOTLIB = False\n",
" print(\"matplotlib not installed - visualization examples will be skipped\")"
]
},
{
Expand Down Expand Up @@ -80,7 +89,10 @@
" first_treat='first_treat', aggregate='event_study')\n",
"\n",
"# Plot event study\n",
"plot_event_study(results_es, title='Two-Stage DiD Event Study')"
"if HAS_MATPLOTLIB:\n",
" plot_event_study(results_es, title='Two-Stage DiD Event Study')\n",
"else:\n",
" print(\"Install matplotlib to see visualizations: pip install matplotlib\")"
]
},
{
Expand All @@ -96,7 +108,17 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "## Per-Observation Treatment Effects\n\nBoth `TwoStageDiD` and `ImputationDiD` provide a `treatment_effects` DataFrame containing one row per treated observation with:\n- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n- `rel_time`: relative time since treatment\n- `weight`: aggregation weight — `1/n_valid` for observations with finite `tau_hat`, `0` for NaN rows (e.g., rank-deficient cases)\n\nThis enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
"source": [
"## Per-Observation Treatment Effects\n",
"\n",
"Both `TwoStageDiD` and `ImputationDiD` provide a `treatment_effects` DataFrame containing one row per treated observation with:\n",
"- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n",
"- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n",
"- `rel_time`: relative time since treatment\n",
"- `weight`: aggregation weight — `1/n_valid` for observations with finite `tau_hat`, `0` for NaN rows (e.g., rank-deficient cases)\n",
"\n",
"This enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
]
},
{
"cell_type": "code",
Expand All @@ -115,7 +137,15 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "## Comparison with Other Estimators\n\nTwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n\nCallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. It uses analytical influence-function standard errors by default, with optional multiplier bootstrap when `n_bootstrap > 0`.\n\n*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
"source": [
"## Comparison with Other Estimators\n",
"\n",
"TwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n",
"\n",
"CallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. It uses analytical influence-function standard errors by default, with optional multiplier bootstrap when `n_bootstrap > 0`.\n",
"\n",
"*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
]
},
{
"cell_type": "code",
Expand Down Expand Up @@ -237,7 +267,22 @@
{
"cell_type": "markdown",
"metadata": {},
"source": "## Summary\n\n| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n|---------|-------------|---------------|------------------|\n| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical influence function (optional bootstrap) |\n| **Per-obs effects** | Yes (`treatment_effects`) | Yes (`treatment_effects`) | No |\n| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n\n**References:**\n- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
"source": [
"## Summary\n",
"\n",
"| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n",
"|---------|-------------|---------------|------------------|\n",
"| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n",
"| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n",
"| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical influence function (optional bootstrap) |\n",
"| **Per-obs effects** | Yes (`treatment_effects`) | Yes (`treatment_effects`) | No |\n",
"| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n",
"| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n",
"\n",
"**References:**\n",
"- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n",
"- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
]
}
],
"metadata": {
Expand All @@ -247,4 +292,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}