Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 79 additions & 5 deletions METHODOLOGY_REVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Each estimator in diff-diff should be periodically reviewed to ensure:
| MultiPeriodDiD | `estimators.py` | `fixest::feols()` | **Complete** | 2026-02-02 |
| TwoWayFixedEffects | `twfe.py` | `fixest::feols()` | **Complete** | 2026-02-08 |
| CallawaySantAnna | `staggered.py` | `did::att_gt()` | **Complete** | 2026-01-24 |
| SunAbraham | `sun_abraham.py` | `fixest::sunab()` | Not Started | - |
| SunAbraham | `sun_abraham.py` | `fixest::sunab()` | **Complete** | 2026-02-15 |
| SyntheticDiD | `synthetic_did.py` | `synthdid::synthdid_estimate()` | **Complete** | 2026-02-10 |
| TripleDifference | `triple_diff.py` | (forthcoming) | Not Started | - |
| TROP | `trop.py` | (forthcoming) | Not Started | - |
Expand Down Expand Up @@ -294,14 +294,88 @@ variables appear to the left of the `|` separator.
| Module | `sun_abraham.py` |
| Primary Reference | Sun & Abraham (2021) |
| R Reference | `fixest::sunab()` |
| Status | Not Started |
| Last Review | - |
| Status | **Complete** |
| Last Review | 2026-02-15 |

**Verified Components:**
- [x] Saturated TWFE regression with cohort × relative-time interactions
- [x] Within-transformation for unit and time fixed effects
- [x] Interaction-weighted event study effects (δ̂_e = Σ_g ŵ_{g,e} × δ̂_{g,e})
- [x] IW weights match event-time sample shares (n_{g,e} / Σ_g n_{g,e})
- [x] Overall ATT as weighted average of post-treatment effects
- [x] Delta method SE for aggregated effects (Var = w' Σ w)
- [x] Cluster-robust SEs at unit level
- [x] Reference period normalized to zero (e=-1 excluded from design matrix)
- [x] R comparison: ATT matches `fixest::sunab()` within machine precision (<1e-11)
- [x] R comparison: SE matches within 0.3% (small scale) / 0.1% (1k scale)
- [x] R comparison: Event study effects correlation = 1.000000
- [x] R comparison: Event study max diff < 1e-11
- [x] Bootstrap inference (pairs bootstrap)
- [x] Rank deficiency handling (warn/error/silent)
- [x] All REGISTRY.md edge cases tested

**Test Coverage:**
- 43 tests in `tests/test_sun_abraham.py` (36 existing + 7 methodology verification)
- R benchmark tests via `benchmarks/run_benchmarks.py --estimator sunab`

**R Comparison Results:**
- Overall ATT matches within machine precision (diff < 1e-11 at both scales)
- Cluster-robust SE matches within 0.3% (well within 1% threshold)
- Event study effects match perfectly (correlation 1.0, max diff < 1e-11)
- Validated at small (200 units) and 1k (1000 units) scales

**Corrections Made:**
- (None yet)
1. **DF adjustment for absorbed FE** (`sun_abraham.py`, `_fit_saturated_regression()`):
Added `df_adjustment = n_units + n_times - 1` to `LinearRegression.fit()` to account
for absorbed unit and time fixed effects in degrees of freedom. Unlike TWFE (which uses
`-2` plus an explicit intercept column), SunAbraham's saturated regression has no
intercept, so all absorbed df must come from the adjustment. Affects t-distribution DoF
for cohort-level p-values/CIs (slightly larger p-values, slightly wider CIs) but does
NOT change VCV or SE values.

2. **NaN return for no post-treatment effects** (`sun_abraham.py`, `_compute_overall_att()`):
Changed return from `(0.0, 0.0)` to `(np.nan, np.nan)` when no post-treatment effects
exist. All downstream inference fields (t_stat, p_value, conf_int) correctly propagate
NaN via existing guards in `fit()`.

3. **Deprecation warnings for unused parameters** (`sun_abraham.py`, `fit()`):
Added `FutureWarning` for `min_pre_periods` and `min_post_periods` parameters that
are accepted but never used (no-op). These will be removed in a future version.

4. **Removed event-time truncation at [-20, 20]** (`sun_abraham.py`):
Removed the hardcoded cap `max(min(...), -20)` / `min(max(...), 20)` to match
R's `fixest::sunab()` which has no such limit. All available relative times are
now estimated.

5. **Warning for variance fallback path** (`sun_abraham.py`, `_compute_overall_att()`):
Added `UserWarning` when the full weight vector cannot be constructed and a
simplified variance (ignoring covariances between periods) is used as fallback.

6. **IW weights use event-time sample shares** (`sun_abraham.py`, `_compute_iw_effects()`):
Changed IW weights from `n_g / Σ_g n_g` (cohort sizes) to `n_{g,e} / Σ_g n_{g,e}`
(per-event-time observation counts) to match the REGISTRY.md formula. For balanced
panels these are identical; for unbalanced panels the new formula correctly reflects
actual sample composition at each event-time. Added unbalanced panel test.

7. **Normalize `np.inf` never-treated encoding** (`sun_abraham.py`, `fit()`):
`first_treat=np.inf` (documented as valid for never-treated) was included in
`treatment_groups` and `_rel_time` via `> 0` checks, producing `-inf` event times.
Fixed by normalizing `np.inf` to `0` immediately after computing `_never_treated`.
Same fix applied to `staggered.py` (`CallawaySantAnna`).

**Outstanding Concerns:**
- (None yet)
- **Inference distribution**: Cohort-level p-values use t-distribution (via
`LinearRegression.get_inference()`), while aggregated event study and overall ATT
p-values use normal distribution (via `compute_p_value()`). This is asymptotically
equivalent and standard for delta-method-aggregated quantities. R's fixest uses
t-distribution at all levels, so aggregated p-values may differ slightly for small
samples — this is a documented deviation.

**Deviations from R's fixest::sunab():**
1. **NaN for no post-treatment effects**: Python returns `(NaN, NaN)` for overall ATT/SE
when no post-treatment effects exist. R would error.
2. **Normal distribution for aggregated inference**: Aggregated p-values use normal
distribution (asymptotically equivalent). R uses t-distribution.

---

Expand Down
131 changes: 131 additions & 0 deletions benchmarks/R/benchmark_sunab.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
#!/usr/bin/env Rscript
# Benchmark: Sun-Abraham interaction-weighted estimator (R `fixest::sunab()`)
#
# This uses fixest::sunab() with unit+time FE and unit-level clustering,
# matching the Python SunAbraham estimator's approach.
#
# Usage:
# Rscript benchmark_sunab.R --data path/to/data.csv --output path/to/results.json

library(fixest)
library(jsonlite)
library(data.table)

# Parse command line arguments
args <- commandArgs(trailingOnly = TRUE)

parse_args <- function(args) {
result <- list(
data = NULL,
output = NULL
)

i <- 1
while (i <= length(args)) {
if (args[i] == "--data") {
result$data <- args[i + 1]
i <- i + 2
} else if (args[i] == "--output") {
result$output <- args[i + 1]
i <- i + 2
} else {
i <- i + 1
}
}

if (is.null(result$data) || is.null(result$output)) {
stop("Usage: Rscript benchmark_sunab.R --data <path> --output <path>")
}

return(result)
}

config <- parse_args(args)

# Load data
message(sprintf("Loading data from: %s", config$data))
data <- fread(config$data)

# Convert first_treat to double before assigning Inf (integer column can't hold Inf)
data[, first_treat := as.double(first_treat)]
# Convert never-treated coding: first_treat=0 -> Inf (R's convention for never-treated)
data[first_treat == 0, first_treat := Inf]

# Run benchmark
message("Running Sun-Abraham estimation with fixest::sunab()...")
start_time <- Sys.time()

# Sun-Abraham with unit+time FE, clustered at unit level
# sunab(cohort, period) creates the interaction-weighted estimator
model <- feols(
outcome ~ sunab(first_treat, time) | unit + time,
data = data,
cluster = ~unit
)

estimation_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs"))

# Extract event study effects (per-relative-period IW coefficients)
es_coefs <- coef(model)
es_ses <- se(model)

# Build event study list
event_study <- list()
coef_names <- names(es_coefs)
for (i in seq_along(es_coefs)) {
name <- coef_names[i]
# fixest sunab names coefficients like "time::-4" or "time::2"
event_time <- as.numeric(gsub("^time::(-?[0-9]+)$", "\\1", name))

event_study[[length(event_study) + 1]] <- list(
event_time = event_time,
att = unname(es_coefs[i]),
se = unname(es_ses[i])
)
}

# Aggregate to get overall ATT (weighted by observation count per cell)
# aggregate() returns a matrix with columns: Estimate, Std. Error, t value, Pr(>|t|)
agg_result <- aggregate(model, agg = "ATT")

overall_att <- agg_result[1, "Estimate"]
overall_se <- agg_result[1, "Std. Error"]
overall_pvalue <- agg_result[1, "Pr(>|t|)"]

message(sprintf("Overall ATT: %.6f (SE: %.6f)", overall_att, overall_se))

# Format output
results <- list(
estimator = "fixest::sunab()",
cluster = "unit",

# Overall ATT (aggregated)
overall_att = overall_att,
overall_se = overall_se,
overall_pvalue = overall_pvalue,

# Event study effects
event_study = event_study,

# Timing
timing = list(
estimation_seconds = estimation_time,
total_seconds = estimation_time
),

# Metadata
metadata = list(
r_version = R.version.string,
fixest_version = as.character(packageVersion("fixest")),
n_units = length(unique(data$unit)),
n_periods = length(unique(data$time)),
n_obs = nrow(data),
n_event_study_coefs = length(es_coefs)
)
)

# Write output
message(sprintf("Writing results to: %s", config$output))
write_json(results, config$output, auto_unbox = TRUE, pretty = TRUE, digits = 15)

message(sprintf("Completed in %.3f seconds", estimation_time))
136 changes: 136 additions & 0 deletions benchmarks/python/benchmark_sun_abraham.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
#!/usr/bin/env python3
"""
Benchmark: SunAbraham interaction-weighted estimator (diff-diff SunAbraham class).

This benchmarks the SunAbraham estimator with cluster-robust SEs,
matching R's fixest::sunab() approach.

Usage:
python benchmark_sun_abraham.py --data path/to/data.csv --output path/to/results.json
"""

import argparse
import json
import os
import sys
from pathlib import Path

# IMPORTANT: Parse --backend and set environment variable BEFORE importing diff_diff
def _get_backend_from_args():
"""Parse --backend argument without importing diff_diff."""
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument("--backend", default="auto", choices=["auto", "python", "rust"])
args, _ = parser.parse_known_args()
return args.backend

_requested_backend = _get_backend_from_args()
if _requested_backend in ("python", "rust"):
os.environ["DIFF_DIFF_BACKEND"] = _requested_backend

# NOW import diff_diff and other dependencies (will see the env var)
import numpy as np
import pandas as pd

# Add parent to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent))

from diff_diff import SunAbraham, HAS_RUST_BACKEND
from benchmarks.python.utils import Timer


def parse_args():
parser = argparse.ArgumentParser(description="Benchmark SunAbraham estimator")
parser.add_argument("--data", required=True, help="Path to input CSV data")
parser.add_argument("--output", required=True, help="Path to output JSON results")
parser.add_argument(
"--backend", default="auto", choices=["auto", "python", "rust"],
help="Backend to use: auto (default), python (pure Python), rust (Rust backend)"
)
return parser.parse_args()


def get_actual_backend() -> str:
"""Return the actual backend being used based on HAS_RUST_BACKEND."""
return "rust" if HAS_RUST_BACKEND else "python"


def main():
args = parse_args()

actual_backend = get_actual_backend()
print(f"Using backend: {actual_backend}")

# Load data
print(f"Loading data from: {args.data}")
data = pd.read_csv(args.data)

# Run benchmark using SunAbraham (analytical SEs, no bootstrap)
print("Running Sun-Abraham estimation...")

sa = SunAbraham(control_group="never_treated", n_bootstrap=0)

with Timer() as timer:
results = sa.fit(
data,
outcome="outcome",
unit="unit",
time="time",
first_treat="first_treat",
)

overall_att = results.overall_att
overall_se = results.overall_se
overall_pvalue = results.overall_p_value

# Extract event study effects
event_study = []
for e in sorted(results.event_study_effects.keys()):
eff = results.event_study_effects[e]
event_study.append({
"event_time": int(e),
"att": float(eff["effect"]),
"se": float(eff["se"]),
})

total_time = timer.elapsed

# Build output
output = {
"estimator": "diff_diff.SunAbraham",
"backend": actual_backend,
"cluster": "unit",
# Overall ATT
"overall_att": float(overall_att),
"overall_se": float(overall_se),
"overall_pvalue": float(overall_pvalue),
# Event study effects
"event_study": event_study,
# Timing
"timing": {
"estimation_seconds": total_time,
"total_seconds": total_time,
},
# Metadata
"metadata": {
"n_units": len(data["unit"].unique()),
"n_periods": len(data["time"].unique()),
"n_obs": len(data),
"n_groups": len(results.groups),
"n_event_study_effects": len(event_study),
},
}

# Write output
print(f"Writing results to: {args.output}")
output_path = Path(args.output)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, "w") as f:
json.dump(output, f, indent=2)

print(f"Overall ATT: {overall_att:.6f} (SE: {overall_se:.6f})")
print(f"Completed in {total_time:.3f} seconds")
return output


if __name__ == "__main__":
main()
Loading