igerber · igerber · Feb 16, 2026 · Feb 16, 2026 · Feb 16, 2026 · Feb 16, 2026
diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
@@ -24,7 +24,7 @@ Each estimator in diff-diff should be periodically reviewed to ensure:
 | MultiPeriodDiD | `estimators.py` | `fixest::feols()` | **Complete** | 2026-02-02 |
 | TwoWayFixedEffects | `twfe.py` | `fixest::feols()` | **Complete** | 2026-02-08 |
 | CallawaySantAnna | `staggered.py` | `did::att_gt()` | **Complete** | 2026-01-24 |
-| SunAbraham | `sun_abraham.py` | `fixest::sunab()` | Not Started | - |
+| SunAbraham | `sun_abraham.py` | `fixest::sunab()` | **Complete** | 2026-02-15 |
 | SyntheticDiD | `synthetic_did.py` | `synthdid::synthdid_estimate()` | **Complete** | 2026-02-10 |
 | TripleDifference | `triple_diff.py` | (forthcoming) | Not Started | - |
 | TROP | `trop.py` | (forthcoming) | Not Started | - |
@@ -294,14 +294,88 @@ variables appear to the left of the `|` separator.
 | Module | `sun_abraham.py` |
 | Primary Reference | Sun & Abraham (2021) |
 | R Reference | `fixest::sunab()` |
-| Status | Not Started |
-| Last Review | - |
+| Status | **Complete** |
+| Last Review | 2026-02-15 |
+
+**Verified Components:**
+- [x] Saturated TWFE regression with cohort × relative-time interactions
+- [x] Within-transformation for unit and time fixed effects
+- [x] Interaction-weighted event study effects (δ̂_e = Σ_g ŵ_{g,e} × δ̂_{g,e})
+- [x] IW weights match event-time sample shares (n_{g,e} / Σ_g n_{g,e})
+- [x] Overall ATT as weighted average of post-treatment effects
+- [x] Delta method SE for aggregated effects (Var = w' Σ w)
+- [x] Cluster-robust SEs at unit level
+- [x] Reference period normalized to zero (e=-1 excluded from design matrix)
+- [x] R comparison: ATT matches `fixest::sunab()` within machine precision (<1e-11)
+- [x] R comparison: SE matches within 0.3% (small scale) / 0.1% (1k scale)
+- [x] R comparison: Event study effects correlation = 1.000000
+- [x] R comparison: Event study max diff < 1e-11
+- [x] Bootstrap inference (pairs bootstrap)
+- [x] Rank deficiency handling (warn/error/silent)
+- [x] All REGISTRY.md edge cases tested
+
+**Test Coverage:**
+- 43 tests in `tests/test_sun_abraham.py` (36 existing + 7 methodology verification)
+- R benchmark tests via `benchmarks/run_benchmarks.py --estimator sunab`
+
+**R Comparison Results:**
+- Overall ATT matches within machine precision (diff < 1e-11 at both scales)
+- Cluster-robust SE matches within 0.3% (well within 1% threshold)
+- Event study effects match perfectly (correlation 1.0, max diff < 1e-11)
+- Validated at small (200 units) and 1k (1000 units) scales
 
 **Corrections Made:**
-- (None yet)
+1. **DF adjustment for absorbed FE** (`sun_abraham.py`, `_fit_saturated_regression()`):
+   Added `df_adjustment = n_units + n_times - 1` to `LinearRegression.fit()` to account
+   for absorbed unit and time fixed effects in degrees of freedom. Unlike TWFE (which uses
+   `-2` plus an explicit intercept column), SunAbraham's saturated regression has no
+   intercept, so all absorbed df must come from the adjustment. Affects t-distribution DoF
+   for cohort-level p-values/CIs (slightly larger p-values, slightly wider CIs) but does
+   NOT change VCV or SE values.
+
+2. **NaN return for no post-treatment effects** (`sun_abraham.py`, `_compute_overall_att()`):
+   Changed return from `(0.0, 0.0)` to `(np.nan, np.nan)` when no post-treatment effects
+   exist. All downstream inference fields (t_stat, p_value, conf_int) correctly propagate
+   NaN via existing guards in `fit()`.
+
+3. **Deprecation warnings for unused parameters** (`sun_abraham.py`, `fit()`):
+   Added `FutureWarning` for `min_pre_periods` and `min_post_periods` parameters that
+   are accepted but never used (no-op). These will be removed in a future version.
+
+4. **Removed event-time truncation at [-20, 20]** (`sun_abraham.py`):
+   Removed the hardcoded cap `max(min(...), -20)` / `min(max(...), 20)` to match
+   R's `fixest::sunab()` which has no such limit. All available relative times are
+   now estimated.
+
+5. **Warning for variance fallback path** (`sun_abraham.py`, `_compute_overall_att()`):
+   Added `UserWarning` when the full weight vector cannot be constructed and a
+   simplified variance (ignoring covariances between periods) is used as fallback.
+
+6. **IW weights use event-time sample shares** (`sun_abraham.py`, `_compute_iw_effects()`):
+   Changed IW weights from `n_g / Σ_g n_g` (cohort sizes) to `n_{g,e} / Σ_g n_{g,e}`
+   (per-event-time observation counts) to match the REGISTRY.md formula. For balanced
+   panels these are identical; for unbalanced panels the new formula correctly reflects
+   actual sample composition at each event-time. Added unbalanced panel test.
+
+7. **Normalize `np.inf` never-treated encoding** (`sun_abraham.py`, `fit()`):
+   `first_treat=np.inf` (documented as valid for never-treated) was included in
+   `treatment_groups` and `_rel_time` via `> 0` checks, producing `-inf` event times.
+   Fixed by normalizing `np.inf` to `0` immediately after computing `_never_treated`.
+   Same fix applied to `staggered.py` (`CallawaySantAnna`).
 
 **Outstanding Concerns:**
-- (None yet)
+- **Inference distribution**: Cohort-level p-values use t-distribution (via
+  `LinearRegression.get_inference()`), while aggregated event study and overall ATT
+  p-values use normal distribution (via `compute_p_value()`). This is asymptotically
+  equivalent and standard for delta-method-aggregated quantities. R's fixest uses
+  t-distribution at all levels, so aggregated p-values may differ slightly for small
+  samples — this is a documented deviation.
+
+**Deviations from R's fixest::sunab():**
+1. **NaN for no post-treatment effects**: Python returns `(NaN, NaN)` for overall ATT/SE
+   when no post-treatment effects exist. R would error.
+2. **Normal distribution for aggregated inference**: Aggregated p-values use normal
+   distribution (asymptotically equivalent). R uses t-distribution.
 
 ---
 

diff --git a/benchmarks/R/benchmark_sunab.R b/benchmarks/R/benchmark_sunab.R
@@ -0,0 +1,131 @@
+#!/usr/bin/env Rscript
+# Benchmark: Sun-Abraham interaction-weighted estimator (R `fixest::sunab()`)
+#
+# This uses fixest::sunab() with unit+time FE and unit-level clustering,
+# matching the Python SunAbraham estimator's approach.
+#
+# Usage:
+#   Rscript benchmark_sunab.R --data path/to/data.csv --output path/to/results.json
+
+library(fixest)
+library(jsonlite)
+library(data.table)
+
+# Parse command line arguments
+args <- commandArgs(trailingOnly = TRUE)
+
+parse_args <- function(args) {
+  result <- list(
+    data = NULL,
+    output = NULL
+  )
+
+  i <- 1
+  while (i <= length(args)) {
+    if (args[i] == "--data") {
+      result$data <- args[i + 1]
+      i <- i + 2
+    } else if (args[i] == "--output") {
+      result$output <- args[i + 1]
+      i <- i + 2
+    } else {
+      i <- i + 1
+    }
+  }
+
+  if (is.null(result$data) || is.null(result$output)) {
+    stop("Usage: Rscript benchmark_sunab.R --data <path> --output <path>")
+  }
+
+  return(result)
+}
+
+config <- parse_args(args)
+
+# Load data
+message(sprintf("Loading data from: %s", config$data))
+data <- fread(config$data)
+
+# Convert first_treat to double before assigning Inf (integer column can't hold Inf)
+data[, first_treat := as.double(first_treat)]
+# Convert never-treated coding: first_treat=0 -> Inf (R's convention for never-treated)
+data[first_treat == 0, first_treat := Inf]
+
+# Run benchmark
+message("Running Sun-Abraham estimation with fixest::sunab()...")
+start_time <- Sys.time()
+
+# Sun-Abraham with unit+time FE, clustered at unit level
+# sunab(cohort, period) creates the interaction-weighted estimator
+model <- feols(
+  outcome ~ sunab(first_treat, time) | unit + time,
+  data = data,
+  cluster = ~unit
+)
+
+estimation_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs"))
+
+# Extract event study effects (per-relative-period IW coefficients)
+es_coefs <- coef(model)
+es_ses <- se(model)
+
+# Build event study list
+event_study <- list()
+coef_names <- names(es_coefs)
+for (i in seq_along(es_coefs)) {
+  name <- coef_names[i]
+  # fixest sunab names coefficients like "time::-4" or "time::2"
+  event_time <- as.numeric(gsub("^time::(-?[0-9]+)$", "\\1", name))
+
+  event_study[[length(event_study) + 1]] <- list(
+    event_time = event_time,
+    att = unname(es_coefs[i]),
+    se = unname(es_ses[i])
+  )
+}
+
+# Aggregate to get overall ATT (weighted by observation count per cell)
+# aggregate() returns a matrix with columns: Estimate, Std. Error, t value, Pr(>|t|)
+agg_result <- aggregate(model, agg = "ATT")
+
+overall_att <- agg_result[1, "Estimate"]
+overall_se <- agg_result[1, "Std. Error"]
+overall_pvalue <- agg_result[1, "Pr(>|t|)"]
+
+message(sprintf("Overall ATT: %.6f (SE: %.6f)", overall_att, overall_se))
+
+# Format output
+results <- list(
+  estimator = "fixest::sunab()",
+  cluster = "unit",
+
+  # Overall ATT (aggregated)
+  overall_att = overall_att,
+  overall_se = overall_se,
+  overall_pvalue = overall_pvalue,
+
+  # Event study effects
+  event_study = event_study,
+
+  # Timing
+  timing = list(
+    estimation_seconds = estimation_time,
+    total_seconds = estimation_time
+  ),
+
+  # Metadata
+  metadata = list(
+    r_version = R.version.string,
+    fixest_version = as.character(packageVersion("fixest")),
+    n_units = length(unique(data$unit)),
+    n_periods = length(unique(data$time)),
+    n_obs = nrow(data),
+    n_event_study_coefs = length(es_coefs)
+  )
+)
+
+# Write output
+message(sprintf("Writing results to: %s", config$output))
+write_json(results, config$output, auto_unbox = TRUE, pretty = TRUE, digits = 15)
+
+message(sprintf("Completed in %.3f seconds", estimation_time))
diff --git a/benchmarks/python/benchmark_sun_abraham.py b/benchmarks/python/benchmark_sun_abraham.py
@@ -0,0 +1,136 @@
+#!/usr/bin/env python3
+"""
+Benchmark: SunAbraham interaction-weighted estimator (diff-diff SunAbraham class).
+
+This benchmarks the SunAbraham estimator with cluster-robust SEs,
+matching R's fixest::sunab() approach.
+
+Usage:
+    python benchmark_sun_abraham.py --data path/to/data.csv --output path/to/results.json
+"""
+
+import argparse
+import json
+import os
+import sys
+from pathlib import Path
+
+# IMPORTANT: Parse --backend and set environment variable BEFORE importing diff_diff
+def _get_backend_from_args():
+    """Parse --backend argument without importing diff_diff."""
+    parser = argparse.ArgumentParser(add_help=False)
+    parser.add_argument("--backend", default="auto", choices=["auto", "python", "rust"])
+    args, _ = parser.parse_known_args()
+    return args.backend
+
+_requested_backend = _get_backend_from_args()
+if _requested_backend in ("python", "rust"):
+    os.environ["DIFF_DIFF_BACKEND"] = _requested_backend
+
+# NOW import diff_diff and other dependencies (will see the env var)
+import numpy as np
+import pandas as pd
+
+# Add parent to path for imports
+sys.path.insert(0, str(Path(__file__).parent.parent.parent))
+
+from diff_diff import SunAbraham, HAS_RUST_BACKEND
+from benchmarks.python.utils import Timer
+
+
+def parse_args():
+    parser = argparse.ArgumentParser(description="Benchmark SunAbraham estimator")
+    parser.add_argument("--data", required=True, help="Path to input CSV data")
+    parser.add_argument("--output", required=True, help="Path to output JSON results")
+    parser.add_argument(
+        "--backend", default="auto", choices=["auto", "python", "rust"],
+        help="Backend to use: auto (default), python (pure Python), rust (Rust backend)"
+    )
+    return parser.parse_args()
+
+
+def get_actual_backend() -> str:
+    """Return the actual backend being used based on HAS_RUST_BACKEND."""
+    return "rust" if HAS_RUST_BACKEND else "python"
+
+
+def main():
+    args = parse_args()
+
+    actual_backend = get_actual_backend()
+    print(f"Using backend: {actual_backend}")
+
+    # Load data
+    print(f"Loading data from: {args.data}")
+    data = pd.read_csv(args.data)
+
+    # Run benchmark using SunAbraham (analytical SEs, no bootstrap)
+    print("Running Sun-Abraham estimation...")
+
+    sa = SunAbraham(control_group="never_treated", n_bootstrap=0)
+
+    with Timer() as timer:
+        results = sa.fit(
+            data,
+            outcome="outcome",
+            unit="unit",
+            time="time",
+            first_treat="first_treat",
+        )
+
+    overall_att = results.overall_att
+    overall_se = results.overall_se
+    overall_pvalue = results.overall_p_value
+
+    # Extract event study effects
+    event_study = []
+    for e in sorted(results.event_study_effects.keys()):
+        eff = results.event_study_effects[e]
+        event_study.append({
+            "event_time": int(e),
+            "att": float(eff["effect"]),
+            "se": float(eff["se"]),
+        })
+
+    total_time = timer.elapsed
+
+    # Build output
+    output = {
+        "estimator": "diff_diff.SunAbraham",
+        "backend": actual_backend,
+        "cluster": "unit",
+        # Overall ATT
+        "overall_att": float(overall_att),
+        "overall_se": float(overall_se),
+        "overall_pvalue": float(overall_pvalue),
+        # Event study effects
+        "event_study": event_study,
+        # Timing
+        "timing": {
+            "estimation_seconds": total_time,
+            "total_seconds": total_time,
+        },
+        # Metadata
+        "metadata": {
+            "n_units": len(data["unit"].unique()),
+            "n_periods": len(data["time"].unique()),
+            "n_obs": len(data),
+            "n_groups": len(results.groups),
+            "n_event_study_effects": len(event_study),
+        },
+    }
+
+    # Write output
+    print(f"Writing results to: {args.output}")
+    output_path = Path(args.output)
+    output_path.parent.mkdir(parents=True, exist_ok=True)
+    with open(output_path, "w") as f:
+        json.dump(output, f, indent=2)
+
+    print(f"Overall ATT: {overall_att:.6f} (SE: {overall_se:.6f})")
+    print(f"Completed in {total_time:.3f} seconds")
+    return output
+
+
+if __name__ == "__main__":
+    main()