Revise survey tutorial to show the cost of ignoring survey design#277
Revise survey tutorial to show the cost of ignoring survey design#277
Conversation
Enhance Section 1 of 16_survey_did.ipynb with evidence-based motivation using the research-grade DGP (informative_sampling, heterogeneous_te_by_strata, return_true_population_att). Adds 5 new cells showing: - Ground truth comparison: naive ATT is ~22% biased vs known population ATT - DEFF diagnostics connecting SE underestimation to effective sample size - 200-iteration MC simulation proving naive 95% CIs cover truth only ~66% and falsely detect pre-trends in ~67% of draws Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Replace the flat horizontal population_att line with a per-horizon truth series: ATT(e)=0 for pre-treatment, and the survey-weighted mean of true_effect among treated observations at each horizon for post-treatment. The truth now correctly varies by horizon due to cohort composition changes (e.g., 2.62 when both cohorts contribute vs 1.95 when only cohort 3 remains at longer horizons). Addresses P1 from AI review on PR #277. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Affected methods: tutorial exposition for
Code Quality No findings in the changed cells. Performance No findings in the changed cells. Maintainability No findings in the changed cells. Tech Debt No findings. I did not find a Security No findings. Documentation/Tests No blocking findings beyond the methodology issue above. I could not independently execute the notebook in this sandbox because the available Python environment is missing notebook dependencies. Path to Approval
|
The SE ratio (survey / naive) compares two different estimators (weighted vs unweighted), which is not a formal design effect. Rename subsection to "SE inflation", remove the DEFF formula and effective-sample-size interpretation, and add a note distinguishing this from the library's proper compute_deff() API in Section 7. Addresses P1 from AI review rerun on PR #277. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good The prior P1 on DEFF terminology is resolved, and I did not find any new unmitigated P0/P1 issues in the notebook-only diff. I have one non-blocking methodology/documentation finding (P2) on how the new pre-trend simulation benchmark is labeled/interpreted. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
16_survey_did.ipynbwith evidence-based motivation using the research-grade DGP (informative_sampling,heterogeneous_te_by_strata,return_true_population_att)Methodology references (required if estimator / math changes)
Validation
jupyter nbconvert --executewithout errorsSecurity / privacy
Generated with Claude Code