Skip to content

Commit 8a72a07

Browse files
authored
Merge pull request #221 from igerber/feature/edid-paper-validation
Add EDID validation tests against paper results
2 parents 215fff5 + 8c68855 commit 8a72a07

7 files changed

Lines changed: 3142 additions & 59 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,3 +90,6 @@ papers/
9090

9191
# Local analysis notebooks (not committed)
9292
analysis/
93+
94+
# Replication data (local only, not for distribution)
95+
replication_data/

tests/conftest.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,10 @@
88
import math
99
import os
1010
import subprocess
11+
import sys
12+
13+
# Make tests/helpers/ importable without adding all of tests/ to sys.path
14+
sys.path.insert(0, os.path.join(os.path.dirname(__file__), "helpers"))
1115

1216
# Force non-interactive matplotlib backend before any test imports it.
1317
# Prevents plt.show() from blocking the test suite on a GUI window.

tests/data/README.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Test Data Fixtures
2+
3+
## hrs_edid_validation.csv
4+
5+
**Source:** Dobkin, C., Finkelstein, A., Kluender, R., & Notowidigdo, M. J. (2018).
6+
"The Economic Consequences of Hospital Admissions." *American Economic Review*, 108(2), 308-352.
7+
Replication kit: https://www.openicpsr.org/openicpsr/project/116186/version/V1/view
8+
9+
**Sample selection:** Follows Sun & Abraham (2021), as used by Chen, Sant'Anna & Xie (2025)
10+
Section 6:
11+
12+
1. Read `HRS_long.dta` from the Dobkin et al. replication kit
13+
2. Keep waves 7-11, retain only individuals present in all 5 waves
14+
3. Filter to ever-hospitalized individuals with `first_hosp >= 8`
15+
4. Filter to ages 50-59 at hospitalization (`age_hosp`)
16+
5. Drop wave 11 (no valid comparison group)
17+
6. Recode `first_hosp == 11` as never-treated (`inf`)
18+
19+
**Expected counts:**
20+
21+
| Column | Values |
22+
|--------|--------|
23+
| Total individuals | 656 |
24+
| Waves | 7, 8, 9, 10 |
25+
| Rows | 2,624 |
26+
| G=8 | 252 |
27+
| G=9 | 176 |
28+
| G=10 | 163 |
29+
| G=inf | 65 |
30+
31+
**Columns:** `unit` (hhidpn), `time` (wave), `outcome` (oop_spend, 2005 dollars), `first_treat` (first_hosp)
32+
33+
**Regeneration:** Requires the Dobkin et al. replication kit (`.gitignore`d as `replication_data/`).
34+
35+
```python
36+
import pandas as pd, numpy as np
37+
df = pd.read_stata("replication_data/116186-V1/Replication-Kit/HRS/Data/HRS_long.dta")
38+
sub = df[df["wave"].isin([7, 8, 9, 10, 11])]
39+
balanced = sub.groupby("hhidpn")["wave"].nunique()
40+
sub = sub[sub["hhidpn"].isin(balanced[balanced == 5].index)]
41+
sub = sub[sub["hhidpn"].isin(sub[sub["first_hosp"].notna()]["hhidpn"].unique())]
42+
fh = sub.groupby("hhidpn")["first_hosp"].first()
43+
sub = sub[sub["hhidpn"].isin(fh[fh >= 8].index)]
44+
ages = sub.groupby("hhidpn")["age_hosp"].first()
45+
sub = sub[sub["hhidpn"].isin(ages[(ages >= 50) & (ages <= 59)].index)]
46+
sub = sub[sub["wave"] <= 10]
47+
sub["first_treat"] = sub["first_hosp"].apply(lambda x: np.inf if x == 11 else int(x))
48+
out = sub[["hhidpn", "wave", "oop_spend", "first_treat"]].copy()
49+
out.columns = ["unit", "time", "outcome", "first_treat"]
50+
out["unit"] = out["unit"].astype(int)
51+
out["time"] = out["time"].astype(int)
52+
out.sort_values(["unit", "time"]).reset_index(drop=True).to_csv(
53+
"tests/data/hrs_edid_validation.csv", index=False
54+
)
55+
```

0 commit comments

Comments
 (0)