From 06ada33639bcedc9778b62b89df07a28ba609390 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 16 Feb 2026 15:55:18 -0500
Subject: [PATCH 1/3] Add tutorial notebook for Two-Stage DiD (Gardner 2022)

New tutorial 12 covering TwoStageDiD estimator: basic usage,
event study, per-observation treatment effects, three-estimator
comparison (TwoStageDiD vs ImputationDiD vs CallawaySantAnna),
group aggregation, anticipation, and GMM vs conservative variance.
Also adds tutorials 11 and 12 to CLAUDE.md listing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 CLAUDE.md                             |   2 +
 docs/tutorials/12_two_stage_did.ipynb | 283 ++++++++++++++++++++++++++
 2 files changed, 285 insertions(+)
 create mode 100644 docs/tutorials/12_two_stage_did.ipynb

diff --git a/CLAUDE.md b/CLAUDE.md
index c8a74971..e4f71006 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -360,6 +360,8 @@ See `docs/performance-plan.md` for full optimization details and `docs/benchmark
   - `08_triple_diff.ipynb` - Triple Difference (DDD) estimation with proper covariate handling
   - `09_real_world_examples.ipynb` - Real-world data examples (Card-Krueger, Castle Doctrine, Divorce Laws)
   - `10_trop.ipynb` - Triply Robust Panel (TROP) estimation with factor model adjustment
+  - `11_imputation_did.ipynb` - Imputation DiD (Borusyak et al. 2024), pre-trend test, efficiency comparison
+  - `12_two_stage_did.ipynb` - Two-Stage DiD (Gardner 2022), GMM sandwich variance, per-observation effects
 
 ### Benchmarks
 
diff --git a/docs/tutorials/12_two_stage_did.ipynb b/docs/tutorials/12_two_stage_did.ipynb
new file mode 100644
index 00000000..41eea3ea
--- /dev/null
+++ b/docs/tutorials/12_two_stage_did.ipynb
@@ -0,0 +1,283 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Two-Stage DiD (Gardner 2022)\n",
+    "\n",
+    "This tutorial demonstrates the `TwoStageDiD` estimator, which implements the two-stage difference-in-differences method from Gardner (2022), \"Two-stage differences in differences\", with inference from Butts & Gardner (2022), \"did2s: Two-Stage Difference-in-Differences\".\n",
+    "\n",
+    "**When to use TwoStageDiD:**\n",
+    "- Staggered adoption settings where you want **GMM sandwich variance** that accounts for first-stage estimation uncertainty\n",
+    "- When you want **per-observation treatment effects** (`treatment_effects` DataFrame) for granular analysis\n",
+    "- As a **robustness check** alongside ImputationDiD: identical point estimates with different inference confirm results are not an artifact of variance estimator choice"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import warnings\n",
+    "warnings.filterwarnings('ignore')\n",
+    "\n",
+    "from diff_diff import (\n",
+    "    TwoStageDiD, ImputationDiD, CallawaySantAnna,\n",
+    "    generate_staggered_data, plot_event_study\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Basic Usage\n",
+    "\n",
+    "The two-stage estimator follows a simple algorithm:\n",
+    "1. Estimate unit and time fixed effects using only **untreated observations** (never-treated + not-yet-treated periods)\n",
+    "2. Residualize **all** outcomes using those estimated FEs\n",
+    "3. Regress residualized outcomes on treatment indicators to obtain the ATT\n",
+    "\n",
+    "This avoids TWFE bias because the fixed effect model is estimated only on clean (untreated) data, preventing treated outcomes from contaminating the counterfactual."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Generate staggered adoption data with known treatment effect\n",
+    "data = generate_staggered_data(n_units=300, n_periods=10, treatment_effect=2.0, seed=42)\n",
+    "\n",
+    "# Fit the two-stage estimator\n",
+    "est = TwoStageDiD()\n",
+    "results = est.fit(data, outcome='outcome', unit='unit', time='period', first_treat='first_treat')\n",
+    "results.print_summary()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Event Study\n",
+    "\n",
+    "Event study aggregation estimates treatment effects at each relative time horizon, enabling visualization of dynamic effects and informal pre-trend assessment."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fit with event study aggregation\n",
+    "est = TwoStageDiD()\n",
+    "results_es = est.fit(data, outcome='outcome', unit='unit', time='period',\n",
+    "                     first_treat='first_treat', aggregate='event_study')\n",
+    "\n",
+    "# Plot event study\n",
+    "plot_event_study(results_es, title='Two-Stage DiD Event Study')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# View event study effects as a table\n",
+    "results_es.to_dataframe(level='event_study')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Per-Observation Treatment Effects\n",
+    "\n",
+    "A feature unique to `TwoStageDiD` is the `treatment_effects` DataFrame, which contains one row per treated observation with:\n",
+    "- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n",
+    "- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n",
+    "- `rel_time`: relative time since treatment\n",
+    "- `weight`: aggregation weight (1/n_treated)\n",
+    "\n",
+    "This enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Per-observation treatment effects (available from the basic fit)\n",
+    "te = results.treatment_effects\n",
+    "print(f\"Shape: {te.shape}\")\n",
+    "print(f\"Columns: {list(te.columns)}\")\n",
+    "print()\n",
+    "te.head(10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Comparison with Other Estimators\n",
+    "\n",
+    "TwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n",
+    "\n",
+    "CallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. Its standard errors come from an analytical multiplier bootstrap on the influence function.\n",
+    "\n",
+    "*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fit all three estimators on the same data\n",
+    "ts = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
+    "                       time='period', first_treat='first_treat')\n",
+    "imp = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n",
+    "                          time='period', first_treat='first_treat')\n",
+    "cs = CallawaySantAnna().fit(data, outcome='outcome', unit='unit',\n",
+    "                            time='period', first_treat='first_treat')\n",
+    "\n",
+    "print(\"Estimator Comparison (True effect = 2.0)\")\n",
+    "print(\"=\" * 55)\n",
+    "print(f\"{'Estimator':<25} {'ATT':>8} {'SE':>8} {'CI Width':>10}\")\n",
+    "print(\"-\" * 55)\n",
+    "\n",
+    "for name, r in [(\"TwoStageDiD\", ts), (\"ImputationDiD\", imp), (\"CallawaySantAnna\", cs)]:\n",
+    "    ci_width = r.overall_conf_int[1] - r.overall_conf_int[0]\n",
+    "    print(f\"{name:<25} {r.overall_att:>8.3f} {r.overall_se:>8.3f} {ci_width:>10.3f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Group Aggregation\n",
+    "\n",
+    "Group aggregation estimates average treatment effects by treatment cohort (groups defined by first treatment period)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Fit with group aggregation\n",
+    "results_grp = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
+    "                                 time='period', first_treat='first_treat',\n",
+    "                                 aggregate='group')\n",
+    "results_grp.to_dataframe(level='group')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Advanced Features\n",
+    "\n",
+    "### Anticipation\n",
+    "\n",
+    "If treatment effects begin before the official treatment date (e.g., firms change behavior in anticipation of a policy), use the `anticipation` parameter to shift the treatment onset back."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Compare ATT with and without anticipation\n",
+    "est_antic = TwoStageDiD(anticipation=1)\n",
+    "results_antic = est_antic.fit(data, outcome='outcome', unit='unit',\n",
+    "                               time='period', first_treat='first_treat')\n",
+    "print(f\"ATT (no anticipation):       {results.overall_att:.3f}\")\n",
+    "print(f\"ATT (1-period anticipation): {results_antic.overall_att:.3f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### GMM Sandwich vs Conservative Variance\n",
+    "\n",
+    "The key methodological distinction between TwoStageDiD and ImputationDiD is the variance estimator:\n",
+    "\n",
+    "- **ImputationDiD's conservative variance** (Theorem 3) is valid under heterogeneous treatment effects but may produce wider confidence intervals than necessary\n",
+    "- **TwoStageDiD's GMM sandwich** accounts for first-stage estimation uncertainty via an influence function correction term\n",
+    "- In practice they usually agree closely; large divergence signals potential specification concerns\n",
+    "- Bootstrap inference is also available via `n_bootstrap=199`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Horizon-by-horizon SE comparison\n",
+    "ts_es = TwoStageDiD().fit(data, outcome='outcome', unit='unit',\n",
+    "                           time='period', first_treat='first_treat',\n",
+    "                           aggregate='event_study')\n",
+    "imp_es = ImputationDiD().fit(data, outcome='outcome', unit='unit',\n",
+    "                              time='period', first_treat='first_treat',\n",
+    "                              aggregate='event_study')\n",
+    "\n",
+    "print(\"Horizon-by-Horizon Comparison: GMM Sandwich vs Conservative Variance\")\n",
+    "print(\"=\" * 70)\n",
+    "print(f\"{'Horizon':>8} {'Effect':>10} {'GMM SE':>10} {'Cons. SE':>10} {'Ratio':>8}\")\n",
+    "print(\"-\" * 70)\n",
+    "\n",
+    "for h in sorted(ts_es.event_study_effects.keys()):\n",
+    "    ts_eff = ts_es.event_study_effects[h]\n",
+    "    imp_eff = imp_es.event_study_effects[h]\n",
+    "    if ts_eff.get('n_obs', 0) == 0:\n",
+    "        print(f\"{h:>8} {'[ref]':>10} {'---':>10} {'---':>10} {'---':>8}\")\n",
+    "        continue\n",
+    "    effect = ts_eff['effect']\n",
+    "    gmm_se = ts_eff['se']\n",
+    "    cons_se = imp_eff['se']\n",
+    "    ratio = gmm_se / cons_se if cons_se > 0 else np.nan\n",
+    "    print(f\"{h:>8} {effect:>10.4f} {gmm_se:>10.4f} {cons_se:>10.4f} {ratio:>8.3f}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n",
+    "|---------|-------------|---------------|------------------|\n",
+    "| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n",
+    "| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n",
+    "| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical (influence function) |\n",
+    "| **Per-obs effects** | Yes (`treatment_effects`) | No | No |\n",
+    "| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n",
+    "| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n",
+    "\n",
+    "**References:**\n",
+    "- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n",
+    "- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

From 9cdfa17a139ea12245acd52cd14b4e10ea15cd79 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 16 Feb 2026 16:02:53 -0500
Subject: [PATCH 2/3] Fix methodology descriptions per PR review feedback

- CallawaySantAnna inference: clarify analytical influence-function SEs
  by default, optional multiplier bootstrap when n_bootstrap > 0
- treatment_effects.weight: correct to 1/n_valid for finite tau_hat,
  0 for NaN rows (not 1/n_treated)
- Summary table: update CS variance description for consistency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/tutorials/12_two_stage_did.ipynb | 41 +++------------------------
 1 file changed, 4 insertions(+), 37 deletions(-)

diff --git a/docs/tutorials/12_two_stage_did.ipynb b/docs/tutorials/12_two_stage_did.ipynb
index 41eea3ea..bccb9e0f 100644
--- a/docs/tutorials/12_two_stage_did.ipynb
+++ b/docs/tutorials/12_two_stage_did.ipynb
@@ -96,17 +96,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Per-Observation Treatment Effects\n",
-    "\n",
-    "A feature unique to `TwoStageDiD` is the `treatment_effects` DataFrame, which contains one row per treated observation with:\n",
-    "- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n",
-    "- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n",
-    "- `rel_time`: relative time since treatment\n",
-    "- `weight`: aggregation weight (1/n_treated)\n",
-    "\n",
-    "This enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
-   ]
+   "source": "## Per-Observation Treatment Effects\n\nA feature unique to `TwoStageDiD` is the `treatment_effects` DataFrame, which contains one row per treated observation with:\n- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n- `rel_time`: relative time since treatment\n- `weight`: aggregation weight — `1/n_valid` for observations with finite `tau_hat`, `0` for NaN rows (e.g., rank-deficient cases)\n\nThis enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
   },
   {
    "cell_type": "code",
@@ -125,15 +115,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Comparison with Other Estimators\n",
-    "\n",
-    "TwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n",
-    "\n",
-    "CallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. Its standard errors come from an analytical multiplier bootstrap on the influence function.\n",
-    "\n",
-    "*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
-   ]
+   "source": "## Comparison with Other Estimators\n\nTwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n\nCallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. It uses analytical influence-function standard errors by default, with optional multiplier bootstrap when `n_bootstrap > 0`.\n\n*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
   },
   {
    "cell_type": "code",
@@ -255,22 +237,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Summary\n",
-    "\n",
-    "| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n",
-    "|---------|-------------|---------------|------------------|\n",
-    "| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n",
-    "| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n",
-    "| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical (influence function) |\n",
-    "| **Per-obs effects** | Yes (`treatment_effects`) | No | No |\n",
-    "| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n",
-    "| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n",
-    "\n",
-    "**References:**\n",
-    "- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n",
-    "- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
-   ]
+   "source": "## Summary\n\n| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n|---------|-------------|---------------|------------------|\n| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical influence function (optional bootstrap) |\n| **Per-obs effects** | Yes (`treatment_effects`) | No | No |\n| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n\n**References:**\n- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
   }
  ],
  "metadata": {
@@ -280,4 +247,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}
\ No newline at end of file

From 113354e73eaef89a386a6d360a02955c9141bd5f Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Mon, 16 Feb 2026 16:07:10 -0500
Subject: [PATCH 3/3] Fix treatment_effects availability claim per PR review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Both TwoStageDiD and ImputationDiD provide treatment_effects
DataFrame — remove incorrect "unique to TwoStageDiD" language
and update summary table accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/tutorials/12_two_stage_did.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/tutorials/12_two_stage_did.ipynb b/docs/tutorials/12_two_stage_did.ipynb
index bccb9e0f..87c9e609 100644
--- a/docs/tutorials/12_two_stage_did.ipynb
+++ b/docs/tutorials/12_two_stage_did.ipynb
@@ -96,7 +96,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## Per-Observation Treatment Effects\n\nA feature unique to `TwoStageDiD` is the `treatment_effects` DataFrame, which contains one row per treated observation with:\n- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n- `rel_time`: relative time since treatment\n- `weight`: aggregation weight — `1/n_valid` for observations with finite `tau_hat`, `0` for NaN rows (e.g., rank-deficient cases)\n\nThis enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
+   "source": "## Per-Observation Treatment Effects\n\nBoth `TwoStageDiD` and `ImputationDiD` provide a `treatment_effects` DataFrame containing one row per treated observation with:\n- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n- `rel_time`: relative time since treatment\n- `weight`: aggregation weight — `1/n_valid` for observations with finite `tau_hat`, `0` for NaN rows (e.g., rank-deficient cases)\n\nThis enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
   },
   {
    "cell_type": "code",
@@ -237,7 +237,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": "## Summary\n\n| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n|---------|-------------|---------------|------------------|\n| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical influence function (optional bootstrap) |\n| **Per-obs effects** | Yes (`treatment_effects`) | No | No |\n| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n\n**References:**\n- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
+   "source": "## Summary\n\n| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n|---------|-------------|---------------|------------------|\n| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical influence function (optional bootstrap) |\n| **Per-obs effects** | Yes (`treatment_effects`) | Yes (`treatment_effects`) | No |\n| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n\n**References:**\n- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
   }
  ],
  "metadata": {