@@ -5,7 +5,14 @@ TOP PRIORITY: Methodology adherence to source material.
55- If the PR changes an estimator, math, weighting, variance/SE, identification assumptions, or default behaviors:
66 1 ) Identify which method(s) are affected.
77 2 ) Cross-check against the cited paper(s) and the Methodology Registry.
8- 3 ) Flag any mismatch, missing assumption check, incorrect variance/SE, or undocumented deviation as P0/P1.
8+ 3 ) Flag any UNDOCUMENTED mismatch, missing assumption check, or incorrect variance/SE as P0/P1.
9+ 4 ) If a deviation IS documented in REGISTRY.md (look for "** Note:** ", "** Deviation from R:** ",
10+ "** Note (deviation from R):** " labels), it is NOT a defect. Classify as P3-informational
11+ (P3 = minor/informational, no action required).
12+ 5 ) Different valid numerical approaches to the same mathematical operation (e.g., Cholesky vs QR,
13+ SVD vs eigendecomposition, multiplier vs nonparametric bootstrap) are implementation choices,
14+ not methodology errors — unless the approach is provably wrong (produces incorrect results),
15+ not merely different.
916
1017SECONDARY PRIORITIES (in order):
11182 ) Edge case coverage (see checklist below)
@@ -47,10 +54,22 @@ When reviewing new features or code paths, specifically check:
4754 - Command to check: ` grep -n "pattern" diff_diff/*.py `
4855 - Flag as P1 if only partial fixes were made
4956
57+ ## Deferred Work Acceptance
58+
59+ This project tracks deferred technical debt in ` TODO.md ` under "Tech Debt from Code Reviews."
60+
61+ - If a limitation is already tracked in ` TODO.md ` with a PR reference, it is NOT a blocker.
62+ - If a PR ADDS a new ` TODO.md ` entry for deferred work, that counts as properly tracking it.
63+ Classify as P3-informational ("tracked in TODO.md"), not P1/P2.
64+ - Only flag deferred work as P1+ if it introduces a SILENT correctness bug (wrong numbers
65+ with no warning/error) that is NOT tracked anywhere.
66+ - Test gaps, documentation gaps, and performance improvements are deferrable. Missing NaN guards
67+ and incorrect statistical output are not.
68+
5069Rules:
5170- Review ONLY the changes introduced by this PR (diff) and the minimum surrounding context needed.
5271- Provide a single Markdown report with:
53- - Overall assessment: ✅ Looks good | ⚠️ Needs changes | ⛔ Blocker
72+ - Overall assessment (see Assessment Criteria below)
5473 - Executive summary (3–6 bullets)
5574 - Sections for: Methodology, Code Quality, Performance, Maintainability, Tech Debt, Security, Documentation/Tests
5675- In each section: list findings with Severity (P0/P1/P2/P3), Impact, and Concrete fix.
@@ -59,6 +78,41 @@ Rules:
5978
6079Output must be a single Markdown message.
6180
81+ ## Assessment Criteria
82+
83+ Apply the assessment based on the HIGHEST severity of UNMITIGATED findings:
84+
85+ ⛔ Blocker — One or more P0: silent correctness bugs (wrong statistical output with no
86+ warning), data corruption, or security vulnerabilities.
87+
88+ ⚠️ Needs changes — One or more P1 (no P0s): missing edge-case handling that could produce
89+ errors in production, undocumented methodology deviations, or anti-pattern violations.
90+
91+ ✅ Looks good — No unmitigated P0 or P1 findings. P2/P3 items may exist. A PR does NOT need
92+ to be perfect to receive ✅. Tracked limitations, documented deviations, and minor gaps
93+ are compatible with ✅.
94+
95+ A finding is MITIGATED (does not count toward assessment) if:
96+ - The deviation is documented in ` docs/methodology/REGISTRY.md ` with a Note/Deviation label
97+ - The limitation is tracked in ` TODO.md ` under "Tech Debt from Code Reviews"
98+ - The PR itself adds a TODO.md entry or REGISTRY.md note for the issue
99+ - The finding is about an implementation choice between valid numerical approaches
100+
101+ When the assessment is ⚠️ or ⛔, include a "Path to Approval" section listing specific,
102+ enumerated changes that would move the assessment to ✅. Each item must be concrete and
103+ actionable (not "improve testing" but "add test for X with input Y").
104+
105+ ## Re-review Scope
106+
107+ When this is a re-review (the PR has prior AI review comments):
108+ - Focus primarily on whether PREVIOUS findings have been addressed.
109+ - New P1+ findings on unchanged code MAY be raised but must be marked "[ Newly identified] "
110+ to distinguish from moving goalposts. Limit these to clear, concrete issues — not
111+ speculative concerns or stylistic preferences.
112+ - New code added since the last review IS in scope for new findings.
113+ - If all previous P1+ findings are resolved, the assessment should be ✅ even if new
114+ P2/P3 items are noticed.
115+
62116## Known Anti-Patterns
63117
64118Flag these patterns in new or modified code:
0 commit comments